Apr 20
3:00 PM

Topic Modeling


When the phrase topic modeling was first thrown around in class, I had no clue to what the meaning was. To complete our assignment for topic modeling we started off by using Zotero. As a group we collected books for the 1800’s, which would be the basis for our final project. While finding authors and books was not hard, the sheer magnitude of books needed was quite a pain, but luckily we had many people, which sped up the process.

After finding our books things got quite difficult. For myself I had many problems regarding the software on my laptop. To start I was not able to generate a word cloud from the topics due to the fact that my java was not working correctly. After re-downloading java, the problem was still the same and had to turn to somebody with much higher computer skills. Without this help I would have been completely lost and would have given up, thus making it questionable and difficult for those who are not tech savvy. From the java problem next came a problem with Zotero standalone. Once again I would have been lost if it were not for someone that was much more computer savvy than myself. After deleting my current Zotero standalone and re-downloading it, things began to run more smoothly.

From this point the actual topic modeling began. To start we used paper machines, which produced a 3-word topic. With only 3 words, it’s difficult to understand and analyze the topic of a book, thus we choose to use another topic modeling platform called Mallet. Mallet worked similarly to paper machines, although it produced result that had up to 10 topics. With 10 topics it allowed a more specific result such as “ mrs crawley, lufton, mr robarts, bishop, lady grantly, lord”. With this many results it is able to tell the book and the topic and theme from this book. If I had a necessary project in which it called me to use topic modeling I would use Mallet because it is more precise. Overall topic modeling is still difficult for me to fully understand, but I have a better understanding of it now.



