For this assignment, I chose to analyze Victorian mystery novels through Lexos. The novels I am analyzing are:
- The Hounds of Baskerville by Sir Arthur Conan Doyle (British)
- The Moonstone by Wilkie Collins (British)
- The Mysterious Affair at Styles by Agatha Christie (American)
- The Mystery of Edwin Drood by Charles Dickens (British)
- The Strange Case of Dr. Jekyll and Mr. Hyde by Robert Louis Stevenson (Scottish)
- The Notting Hill Mystery by Charles Felix/Charles Warren Adams (UK)
- A Study in Scarlet by Sir Arthur Conan Doyle (British)
- The Lady in White by Wilkie Collins (British)
- Lady Audley’s Secret by Mary Elizabeth Braddon (UK)
- The Mystery of a Hansom Cab by Fergus Hume (Australian)
- Martin Hewitt, Investigator by Arthur Morrison (British)
- The Big Bow Mystery by Israel Zangwill (British)
Initial scrubbing without stop or keep words produced generally expected results. The two Doyle books and the two Collins books were grouped together as expected. Also as expected, Agatha Christie’s American-published novel was very radically different from all the rest. Interestingly enough, however, is the fact that Hume’s Hansom Cab mystery inserts itself in the middle of all the British writers, despite being an Australian novel. It is also closely paired with Braddon’s Lady Audley’s Secret, a British novel that was written as a very sensationalist mystery. This tells me that it is possible that Hume and Braddon had similar writing styles despite different countries of origin.
When scrubbed for all stop words, the order the novels appeared changed radically. Doyle and Collins still had books paired, but now Morrison’s Martin Hewitt is paired with Zangwill’s The Big Bow Mystery. This can possibly allude to the fact that both Zangwill and Morrison are known as “slum writers.” That is, they were well-known for writing books about the English slums, despite also having mystery works as seen in this post. I find it most interesting that Dickens, a writer also known for his writing about the English slums, was nowhere near Morrison and Zangwill on this dendogram. Instead, Lexos chose to place Dickens next to the Collins books and the lone Stevenson book. A possible explanation for Dickens’ close proximity to Collins is that the two were known to be good friends in life, and thus could have possibly adapted each other’s writing style. As expected, however, Christie’s Styles is still the primary outlier in the group.
The final dendogram shows the texts if the prior NLTK stop words list was instead made a keep words list. As expected, both the Doyle and Collins books stayed together. In addition, the Braddon and Hume novels were once again placed next to each other. This, again, points to a possible close relationship in writing style, perhaps due to one novel being a “foreign” novel and the other being written as a sensationalist novel. The most stark change is the fact that Christie’s mystery novel is now no longer the primary outlier of the rest of the books. This honor now belongs to Zangwill.