Apr 20
2016
10:15 AM

WHAT IS TOPIC MODELING?

That is a great question! The more I read about the subject, the more I fear I don’t understand it as completely as I should. It feels like I am digging a well, but I started at the top of Mount Everest; the amount of information and work that can go into using and understanding topic modeling is huge. I can see why some humanists are scared off by the subject or even attempting to play around with topic modeling software; they might be swimming in familiar waters, but those waters have been stirred up and are now murky, making them seem unfamiliar and even scary to be in. Using Paper Machines in conjunction with Zotero sounded fascinating and I was excited to see the results. What didn’t excite were the results. The word cloud gave me hope. There was a small feeling of success. I could see a few words that I would associate as having been used throughout the corpus more than others. It didn’t tell me much, but it gave me a sense of moving forward.

The frustrating thing I see with topic modeling is what might possibly be the randomness to which the clusters of words, or topics, are generated (I will use the terms cluster(s) and topic(s) interchangeably throughout the post). It is my understanding that the algorithm is pretty complex (mathematically), so I trust in the software to have been correctly inputted, as my last math class was Statistics, which happened so long ago, I can only say I took the class and my mathematical skills have been reduced to fractions. From what I have read and understand, the algorithm can generate different clusters with each use, but will still be similar in results. I would correlate that to two different people reading the same text and coming up with slightly different topics, while still having the major themes match. In an odd way, I had hoped that the computer-generated clusters would be a bit more precise or accurate in regards to giving me something to readily digest and interpret from the corpus we entered into Zotero.

Having used less than 100 books authored in the 1800s gave me the ability to at least recognize the books by some of the clusters. Having familiarity with the books made the clusters understandable. I can see the potential of pairing down a large text into multiple pieces, which then could be more easily scanned through some form of topic modeling, and the results would be beneficial in understanding some of the main ideas that have been written in the text as a whole. An example from the topic modeling I used with 20 topics: “ship man captain whale sea deck men boat ye” could be linked to whaling, more specifically, the book Moby-Dick. With 50 topics it looked like: “whale ship man sea captain deck men war ye”. The only difference in this are the order of words and the 20-topic model had “boat” in it, while the 50-topic model had “war”, otherwise they had the same topics. I should mention these clusters were given to me and generated by using MALLET. Paper Machines was far less successful and would generate large numbers of topic models, but only gave me three words. For example: “whale”, “ahab”, and “trapper” came up in one cluster. If I had not known Moby-Dick was in the corpus used, I would have guessed that it was about the same book, as “Whale” and “Ahab” are almost synonymous with Moby-Dick, but if I had never heard of the book, I would have been hard pressed to say this was about anything.

My critique of Paper Machines might be from user error. With only three words to each cluster, it is difficult to point a finger at myself though. I am willing to say, I might have selected the wrong option and with each use, became less enthused and more irritated at the project and possibly gave up in frustration at my inability to use the software correctly.   Regardless of my poor results with Paper Machine, I can see how topic modeling can be beneficial and possible shine new ideas on old thoughts. I can also admit my analytical skills are not up to par when it comes to the use of topic modeling; another way to say that if I extrapolated anything from the clusters in a meaningful way adding to the larger scope of academia, it would be by accident.

Apr 17
2016
8:04 PM

Switching to Zotero standalone

Remove the Zotero add-on to Firefox

Open Firefox, type about:addons into the location bar, and click on the Remove button for the Zotero add-on. If you see two Zotero-related add-ons (i.e., Zotero and the Zotero Word for Mac plugin), you need to remove them both.

Clean up your old Zotero-in-Firefox data

This part is harder. Open a Finder window by clicking on the smiley-faced icon on far left of the dock at the bottom of your screen. Make sure you’re in your home directory by clicking on the house icon. Type ⌘-J to pop up the View Options menu. Check Show Library Folder at the bottom of the menu. Open the Library folder. Inside the Library folder, open the Applications Support folder. Inside the Applications Support folder, open the Firefox folder. Inside the Firefox folder open the Profiles folder.

Once you’re inside the Profiles folder, you should see exactly one folder, with a name made up from some random letters and numbers. Mine, for example, is p572h74i.default. The exact alphanumeric string will be different for each of your profiles, but that gives you the general idea. Open that folder, and inside you should find a zotero folder. Delete it by dragging it to Trash.

Install Zotero Standalone

Go to https://www.zotero.org/download/, and follow the instruction in the blue box on the right (“Download Zotero for Mac”, and “Next, add one of the following browser extensions”, making sure to choose Firefox, which should be highlighted).

Install Paper Machines

Go to http://papermachines.org/install/, and follow the instructions for installing Paper Machines in Zotero Standalone (i.e., “To install in Zotero Standalone, right-click (control-click) the link above …”)

Run Paper Machines

You should then be able to right click on the Assignment 7 folder in the English 294 Group Library and get word cloud and topic modeling results.

Apr 13
2016
1:28 PM

scalar final

My final project for scalar will be a brief history of the use of ceramic based compounds in engineering applications, specifically in high friction or heat applications. i intend to outline the reasons behind ceramic compounds effectiveness in these situations as well to give readers more of a background in the subject;

Apr 13
2016
12:48 PM

Java Problem Solved!

The good news is that I’ve have identified the problem that was preventing those of you running OS X El Capitan 10.11.X from installing the Java Run-Time Environment (JRE) on your laptops. The bad news is that the fix is definitely in the don’t try this at home category. Thanks to Molly, who volunteered her laptop for me to try out the fix.

Please come and see me during my office hours tomorrow (Thursday), and I’ll install the fix for you. Officially, my office hours start at 2:30PM, but I expect to be on campus from 12:00 Noon on. Send an email, and let me know when you think you’ll be able to drop by. Again, this applies to those of you who were having problems installing the JRE.

Apr 13
2016
11:30 AM

Scalar Project

I will be doing my scalar project on fractals with a focus on Leibniz’s attempt at proving god using fractals. I chose this because scalar’s multimedia options make it ideal for displaying fractals. I chose to narrow it to include Leibniz so it would not be an overly technical and mathematical presentation of fractals.

Apr 12
2016
10:43 PM

Java Problems

I’ve been researching the Java problem we ran into in class today, and I may have an answer. But it would really help me to know which version of the Mac OS X operating system each of your systems is running. From the Apple () menu in the upper left corner, choose About This Mac. I just need the first two lines. For example, mine says: Mac OS X Yosemite Version 10.10.5. My working hypothesis is that the people who are experiencing the problem are running some version of El Capitan (10.11).

Please leave a reply to this post with your operating system information. Thanks.

Apr 12
2016
1:55 PM

Assignment 8: Authorship Attribution

Alexander Hamilton is having a very good year. He’s the star of his own hit Broadway musical, and he (along with James Madison) is going to be the star of our last assignment.

Hamilton

As is the case with most real-world Digital Humanities projects, collaboration and cooperation will be essential for Assignment 8. It will therefore be conducted almost entirely as an in-class activity. There is not an individual deliverable. You will instead be evaluated on your contribution to the group effort. Due in class Thursday, April 28.

Our goal will be to reproduce the results of Mosteller and Wallace (1964), who used computational statistical techniques to confirm Adair’s (1944) authorship attribution for disputed numbers of The Federalist Papers. We will be using R, RStudio Desktop, and the stylo R package to analyze the text of The Federalist Papers available here from Project Gutenberg.

Stay tuned, I’ll be adding technical details to this post soon.

Apr 12
2016
1:46 PM

Assignment 7: Topic Modeling

Welcome to Unit 4!

Assignments 7 and 8 will both require more collaboration and coordination between students than the previous projects, and in this regard will offer something closer to the experience of a real digital humanities project. The theme for both assignments is the use of techniques that take advantage of statistical properties of word frequencies in corpora of texts to provide information about topics and authorship.

Assignment 7: Overview

You will use Paper Machines, an add-on to Firefox and Zotero, to generate topic models from a corpus of texts you’re going to collectively assemble from material on Project Gutenberg. You will then write a 300-word blog post describing your findings. Due on Wednesday April 20th at 11:59PM PDT.

Assignment 7: Details

First, we need to perform minor surgery on Firefox. Don’t try this at home, kids!

  1. type about:config in the location bar on Firefox
  2. change xpinstall.signatures.required from true to false
    (right-click on the line, then select Toggle from the menu)

Then, install Paper Machines

Make sure PDF Indexing is turned on in Zotero Preferences
(third button from the left = the gear icon, Search tab)

Take a look at Getting Started. Note especially the following caveat: “Some users have found that Paper Machines produces empty results for smaller datasets. We suggest beginning with at least 20 files before you attempt a wordcloud or relational diagram, and more like 50 to 100 before you attempt Topic Modeling.”

This is where group collaboration and coordination comes in.

Rather than have each of you attempt to gather 50-100 texts, we are going to use Zotero to build up a group library from texts found on the Project Gutenberg. You have all received, and some of you have accepted, the invitation to join the group English 294. If you haven’t done so already, please do so now. As with our previous work involving material sourced from Project Gutenberg, the texts will have to be hand-edited to ensure that you don’t get spurious results.

Apr 10
2016
11:07 AM

Scalar Project

My Scalar project will be an analysis on Shakespeare’s use of the Jewish people in his plays. From the villain, Shylock in The Merchant of Venice to the many derogatory references of characters behaving like Jews in his other plays, I will attempt to take a stance on whether Shakespeare was using stereotypes in his day to draw in the crowds and get a laugh, or if it he was making a subtle statement on his society’s deplorable behavior. I think I will end up with a more middle of the road answer: Shakespeare knew what he was doing and carefully navigated complicated political statements that could be viewed in a number of ways, but not cause him to be imprisoned or lose his head.

Apr 7
2016
1:21 PM

Assignment 6

For my final assignment on Scalar I will produce a research paper that looks into the United States’ involvement in refugee crisis situations as compared to other nations, specifically I want to explore how the US handles refugees from Sudan and Somalia. I will likely use some of the articles I found for my annotated bibliography as well as material from my Pols494 class to conduct my research. I want to evaluate the differences in the way that the US handles refugee situations and how different regions refugees are dealt with in different manners. Using Scalar I will be able to insert graphs and images and separate the information so that it is easier to follow and more cohesive.