I plan on doing my scholar project on the topic of Violent crimes. I plan to dissect different factor that go into to people committing “Violent Crimes”. One thing that will be interesting for the scalar project will be the use of multimedia for my topic. Hopefully it doesn’t get to gory.
Initially I was unsure of what Topic Modeling would look like, before assignment 7 was officially started I attempted to create a Topic Model on the firefox version of Zotero, which didn’t work due to one of the many technical difficulties that were experienced during this assignment. In order to Utilize the topic modeling program, Paper Machines, it was determined we also needed the standalone version of Zotero rather than the Firefox edition, we also needed a large amount of material to pull samples from in order to properly create the model. Other members of the class and myself collected a total of 97 novels for the program to sample from. Topic Modeling involves using a computer to take a plethora of texts, randomizing their word order and grouping and then taking samples of the raw word data and comparing it to total word count of the whole sample, as well as determining what words are grouped closely to the samples which allows the computer to begin sorting the words into categories they are associated with. The books we used were all written in the same time period in order to prevent slang from different time periods being registered as a topic. I encountered Technical problems that prevented me from producing my own topic model with Zotero so the categories I am observing the results of the 50 topic .mallet file that Professor Evans sent. Many of the categories generated contain proper names that give away what books they reference, such as line 4 “holmes man mr sir watson house judge elizabeth cried” obviously a theme related to the books “Sherlock holmes” by Arthur Conan Doyle which revolve around a detective, Sherlock Holmes, and his assistant Watson, elizabeth is watson’s love interest and as a detective, crimes often end with some involvement of a judge explaining the presence of the word. Other lines such as line 44 “good make give years leave till continued half answer” are very vague and offer little information to what the theme or books sampled could be. overall using Zotero as a Topic Modeling software was easy enough to understand and operate that I believe it could be adopted widespread as a tool for humanists who are uneasy with the mysteries of coding, however I believe that the technical issues involved with the program currently are limiting its potential.
When the phrase topic modeling was first thrown around in class, I had no clue to what the meaning was. To complete our assignment for topic modeling we started off by using Zotero. As a group we collected books for the 1800’s, which would be the basis for our final project. While finding authors and books was not hard, the sheer magnitude of books needed was quite a pain, but luckily we had many people, which sped up the process.
After finding our books things got quite difficult. For myself I had many problems regarding the software on my laptop. To start I was not able to generate a word cloud from the topics due to the fact that my java was not working correctly. After re-downloading java, the problem was still the same and had to turn to somebody with much higher computer skills. Without this help I would have been completely lost and would have given up, thus making it questionable and difficult for those who are not tech savvy. From the java problem next came a problem with Zotero standalone. Once again I would have been lost if it were not for someone that was much more computer savvy than myself. After deleting my current Zotero standalone and re-downloading it, things began to run more smoothly.
From this point the actual topic modeling began. To start we used paper machines, which produced a 3-word topic. With only 3 words, it’s difficult to understand and analyze the topic of a book, thus we choose to use another topic modeling platform called Mallet. Mallet worked similarly to paper machines, although it produced result that had up to 10 topics. With 10 topics it allowed a more specific result such as “ mrs crawley, lufton, mr robarts, bishop, lady grantly, lord”. With this many results it is able to tell the book and the topic and theme from this book. If I had a necessary project in which it called me to use topic modeling I would use Mallet because it is more precise. Overall topic modeling is still difficult for me to fully understand, but I have a better understanding of it now.
As a class we gathered almost 100 books and imported them into a group Zotero folder. By doing this we provided ourselves with a wide array of novels produced in the 1800s to utilize for our topic modeling assignment. Topic modeling is defined as a statistical model for discovering abstract topics with in a set of documents. The machine itself identifies patterns and that groups words into different topics. This kind of tool would definitely be handy for someone who analyzes a lot of bodies of work or who read a lot of material. The tool allows someone to, without reading any of the texts, have a pretty good idea about what they are about and what kind of subjects are being dealt with.
However, in our collection, because there are so many books with so many different names and characters I felt that the topic modeling basically just organized topics based on the characters and therefore our topics were the books themselves. Which shows that the tool is useful but not necessarily tells us what the books are about. As opposed to the Decretis example in which the main topics were more evident.
In our sampling produced on Paper Machine we all ended up with only 3 word topics, which didn’t provide very much for analysis. My most evident topic was “sea, whale, men” which was obviously from Moby Dick, but all my others were relatively vague. However, from the 10-word topics produced on Mallet we were able to get a much better variety of topics. Line 18, which contains, “life, love, heart, death, father, nature, human, earth, mother” is more broad and produced a topic more related to family, the world and love. Then there are topic lines like, line 8, which has “pierre, thou, mother, madame, lucy, thee, mrs, Isabel, Robert” this line is much more reflective of likely one text. However, because I am not familiar with these books, none of them mean that much to me. If I was working on a project more catered to text I am familiar with however I think that this tool could be a lot more useful. Perhaps with more familiarity and time to perfect our use of Paper Machines the tool would be more useful but because I was only able to produce lines of 3 I feel that it was relatively unsubstantial.
To begin, the entire process from installation to execution was quite difficult. Besides the obvious difficulties regarding software, there was also the need for a large collection of texts. Due to collaboration, gathering approximately one hundred texts was not too difficult. However, working individually, it would be very hard to accumulate on enough texts to effectively use Paper Machines.
There was also some difficulty concerning the speed of the process. I went into paper machines preferences in Zotero to increase the memory allocation in an attempt to reduce the run time. I also reduced the topics from fifty to twenty because I felt that would give more meaningful results.
However, once we made it through the process the paper machines results were less informative than I expected. I felt the three words topics were not enough to accurately portray the authorship of the topics. As we discussed, paper machines is an attempt at a user-friendly software with similar functionality to mallet. However, the mallet output we looked at in class returned topics of eight to ten words, which made it a lot easier to discern the specific text and/or author related to the topic. With three words the topic results in paper machines were often too generic. For example, the topic “great, love, family” are very general themes that appear in many novels. Another very generic topic result I got was “people, character, earlier” which are not strong topics or themes. On the contrary, the topic “Catherine Heathcliff etc” from mallet was easy to identify as Wuthering Heights.
Overall, I think Mallet provided better topics and learning to use Mallet would have been comprably difficult to the entire installation and set up necessary for Paper Machines. I believe the idea behind Paper machines of a user-friendly software for topic modeling is a great idea, however, I feel it has been poorly executed by paper machines.
That is a great question! The more I read about the subject, the more I fear I don’t understand it as completely as I should. It feels like I am digging a well, but I started at the top of Mount Everest; the amount of information and work that can go into using and understanding topic modeling is huge. I can see why some humanists are scared off by the subject or even attempting to play around with topic modeling software; they might be swimming in familiar waters, but those waters have been stirred up and are now murky, making them seem unfamiliar and even scary to be in. Using Paper Machines in conjunction with Zotero sounded fascinating and I was excited to see the results. What didn’t excite were the results. The word cloud gave me hope. There was a small feeling of success. I could see a few words that I would associate as having been used throughout the corpus more than others. It didn’t tell me much, but it gave me a sense of moving forward.
The frustrating thing I see with topic modeling is what might possibly be the randomness to which the clusters of words, or topics, are generated (I will use the terms cluster(s) and topic(s) interchangeably throughout the post). It is my understanding that the algorithm is pretty complex (mathematically), so I trust in the software to have been correctly inputted, as my last math class was Statistics, which happened so long ago, I can only say I took the class and my mathematical skills have been reduced to fractions. From what I have read and understand, the algorithm can generate different clusters with each use, but will still be similar in results. I would correlate that to two different people reading the same text and coming up with slightly different topics, while still having the major themes match. In an odd way, I had hoped that the computer-generated clusters would be a bit more precise or accurate in regards to giving me something to readily digest and interpret from the corpus we entered into Zotero.
Having used less than 100 books authored in the 1800s gave me the ability to at least recognize the books by some of the clusters. Having familiarity with the books made the clusters understandable. I can see the potential of pairing down a large text into multiple pieces, which then could be more easily scanned through some form of topic modeling, and the results would be beneficial in understanding some of the main ideas that have been written in the text as a whole. An example from the topic modeling I used with 20 topics: “ship man captain whale sea deck men boat ye” could be linked to whaling, more specifically, the book Moby-Dick. With 50 topics it looked like: “whale ship man sea captain deck men war ye”. The only difference in this are the order of words and the 20-topic model had “boat” in it, while the 50-topic model had “war”, otherwise they had the same topics. I should mention these clusters were given to me and generated by using MALLET. Paper Machines was far less successful and would generate large numbers of topic models, but only gave me three words. For example: “whale”, “ahab”, and “trapper” came up in one cluster. If I had not known Moby-Dick was in the corpus used, I would have guessed that it was about the same book, as “Whale” and “Ahab” are almost synonymous with Moby-Dick, but if I had never heard of the book, I would have been hard pressed to say this was about anything.
My critique of Paper Machines might be from user error. With only three words to each cluster, it is difficult to point a finger at myself though. I am willing to say, I might have selected the wrong option and with each use, became less enthused and more irritated at the project and possibly gave up in frustration at my inability to use the software correctly. Regardless of my poor results with Paper Machine, I can see how topic modeling can be beneficial and possible shine new ideas on old thoughts. I can also admit my analytical skills are not up to par when it comes to the use of topic modeling; another way to say that if I extrapolated anything from the clusters in a meaningful way adding to the larger scope of academia, it would be by accident.
Remove the Zotero add-on to Firefox
Open Firefox, type about:addons into the location bar, and click on the Remove button for the Zotero add-on. If you see two Zotero-related add-ons (i.e., Zotero and the Zotero Word for Mac plugin), you need to remove them both.
Clean up your old Zotero-in-Firefox data
This part is harder. Open a Finder window by clicking on the smiley-faced icon on far left of the dock at the bottom of your screen. Make sure you’re in your home directory by clicking on the house icon. Type ⌘-J to pop up the View Options menu. Check Show Library Folder at the bottom of the menu. Open the Library folder. Inside the Library folder, open the Applications Support folder. Inside the Applications Support folder, open the Firefox folder. Inside the Firefox folder open the Profiles folder.
Once you’re inside the Profiles folder, you should see exactly one folder, with a name made up from some random letters and numbers. Mine, for example, is p572h74i.default. The exact alphanumeric string will be different for each of your profiles, but that gives you the general idea. Open that folder, and inside you should find a zotero folder. Delete it by dragging it to Trash.
Install Zotero Standalone
Go to https://www.zotero.org/download/, and follow the instruction in the blue box on the right (“Download Zotero for Mac”, and “Next, add one of the following browser extensions”, making sure to choose Firefox, which should be highlighted).
Install Paper Machines
Go to http://papermachines.org/install/, and follow the instructions for installing Paper Machines in Zotero Standalone (i.e., “To install in Zotero Standalone, right-click (control-click) the link above …”)
Run Paper Machines
You should then be able to right click on the Assignment 7 folder in the English 294 Group Library and get word cloud and topic modeling results.
My final project for scalar will be a brief history of the use of ceramic based compounds in engineering applications, specifically in high friction or heat applications. i intend to outline the reasons behind ceramic compounds effectiveness in these situations as well to give readers more of a background in the subject;
The good news is that I’ve have identified the problem that was preventing those of you running OS X El Capitan 10.11.X from installing the Java Run-Time Environment (JRE) on your laptops. The bad news is that the fix is definitely in the don’t try this at home category. Thanks to Molly, who volunteered her laptop for me to try out the fix.
Please come and see me during my office hours tomorrow (Thursday), and I’ll install the fix for you. Officially, my office hours start at 2:30PM, but I expect to be on campus from 12:00 Noon on. Send an email, and let me know when you think you’ll be able to drop by. Again, this applies to those of you who were having problems installing the JRE.
I will be doing my scalar project on fractals with a focus on Leibniz’s attempt at proving god using fractals. I chose this because scalar’s multimedia options make it ideal for displaying fractals. I chose to narrow it to include Leibniz so it would not be an overly technical and mathematical presentation of fractals.