Apr 20
11:29 AM

Topic Modeling

As a class we gathered almost 100 books and imported them into a group Zotero folder. By doing this we provided ourselves with a wide array of novels produced in the 1800s to utilize for our topic modeling assignment. Topic modeling is defined as a statistical model for discovering abstract topics with in a set of documents. The machine itself identifies patterns and that groups words into different topics. This kind of tool would definitely be handy for someone who analyzes a lot of bodies of work or who read a lot of material. The tool allows someone to, without reading any of the texts, have a pretty good idea about what they are about and what kind of subjects are being dealt with.
However, in our collection, because there are so many books with so many different names and characters I felt that the topic modeling basically just organized topics based on the characters and therefore our topics were the books themselves. Which shows that the tool is useful but not necessarily tells us what the books are about. As opposed to the Decretis example in which the main topics were more evident.
In our sampling produced on Paper Machine we all ended up with only 3 word topics, which didn’t provide very much for analysis. My most evident topic was “sea, whale, men” which was obviously from Moby Dick, but all my others were relatively vague. However, from the 10-word topics produced on Mallet we were able to get a much better variety of topics. Line 18, which contains, “life, love, heart, death, father, nature, human, earth, mother” is more broad and produced a topic more related to family, the world and love. Then there are topic lines like, line 8, which has “pierre, thou, mother, madame, lucy, thee, mrs, Isabel, Robert” this line is much more reflective of likely one text. However, because I am not familiar with these books, none of them mean that much to me. If I was working on a project more catered to text I am familiar with however I think that this tool could be a lot more useful. Perhaps with more familiarity and time to perfect our use of Paper Machines the tool would be more useful but because I was only able to produce lines of 3 I feel that it was relatively unsubstantial.

