Final Project: Authorship Attribution of Gentylnes and Nobylyte

For my final project I collaborated with the Tudor Plays Project at USD in an effort to assign authorship to the Tudor era play Gentylnes and Nobylyte. I did this by comparing frequent words among multiple texts, including Gentylnes and Nobylyte, a separate section from Gen&Nob (“The Philosopher”), several pieces attributed to John Heywood, one written by John Rastell, and one by John Bale. The inclusion of Haywood and Rastell is invaluable, as many scholars believe them to have been involved in the crafting of Gen&Nob (there is talk that perhaps Rastell was the sole author of the play); seeing how the texts relate is essential in beginning to understand if and how they were involved.

My project began with the crafting of a list of the five-thousand most frequently used words throughout the text, which I analyzed by hand and defined into parts of speech (e.g. article, verb, noun, adjective, conjunction, etc.). The main words I focused on were articles, prepositions, and conjunctions (both coordinating and subordinating), as “it is exactly these subtle ‘features’ … that authorship and stylometry researchers have discovered to be the most telling when it comes to revealing an author’s individual style” (Jockers, 64). These words do not correspond to specific genres, and focusing on them highlights aspects of authorial style Jockers does point out, however, that “some genre forms clearly move writers to employ more prepositions; other genres demand more articles, or more pronouns, and so on” (99). This is something I had to keep in mind when analyzing my results.

When deciding which words to keep and which to comment out, I also decided to retain auxiliary verbs (and modal auxiliary verbs) and conjunctive adverbs. These are marked differently on my hard copy for easier finding in case we would want to comment them out in additional runthroughs. I think it would be valuable to try a runthrough without auxiliary verbs such as have, do, and is, as more often than not in the text these appear as stand-alone verbs. Modal auxiliary verbs (e.g. may, shall, ought, will, etc.), while not appearing on their own, do seem to be more closely related to genre (philosophical questions about someone’s future, for example, would involve more questions of what one should//ought/could/would/etc. do); even so, I think that they also demonstrate authorial style and their connection to genre is small enough to simply take into account but not as a means for removal. Again, this is the tricky part when analyzing frequent words, as we are only considered the top sixty-five and not all the words that I kept on the written copy.

The initial analysis was based on the top sixty-five most frequent words appearing in the texts – including articles, verbs, nouns, etc. (no words commented out). This list is as follows:

the      and      to      i      of      in      that      a      for      ye      is      be      as

by      all      my      so      but      me      haue      not      it      your     this      no

shall      he      wyll      with      we      you      thou      or      what      they      here

god      do      at      can      our      well      now      page      unnumbered      his

may      nor      then      hym      man      her      one      yet      thy      good      them

loued      hys      loue      louer      more      thys      am      than

Using the stylometry package in RStudio, I was able to analyze the relation among the texts based on these words.

The results seem to be about what one would expect, with most of Heywood’s pieces matching up with each other. The most startling here is the Philosopher excerpt from Gen&Nob appearing in a completely different branch than the Gen&Nob text, linked only at the farthest possible connection. Heywood’s The Pardoner and the Frere and Bale’s The Three Laws are linked, which could be because of the words such as “god”, “good”, and “loue” on the initial word list (both plays discuss religion). Gen&Nob is linked to Rastell’s The Foure Elements, which further supports the idea that Rastell may have been involved  in composing Gen&Nob. Regardless of the initial results, the content words in the list – and the words “page” and “unnumbered”, which aren’t actually a part of the texts themselves – tie some of the texts together through genre and topic.

Using Atom, I commented out content words from the original list, including pronouns, adjectives, verbs, and adverbs. The list I ended up with includes articles, prepositions, conjunctions, modal auxiliary verbs, and conjunctive adverbs.

the      and      to      of      in      a      for      is      be      as      by      so

but      haue      shall      wyll      with      or      do      at      can      may

nor      then      yet      am      than      on      yf      hath      was      were

wolde      where      from      doth      are      had      if      must      though

an      thus      how      both      when      also      without      shulde      syns

wherfore      after      before       dyd      whan      maye      coulde      bothe

vnto      shuld      vpon      ys      hast      howe      aboue

Running this list through RStudio, I saw a very different relationship among the texts in the corpus.

The three texts in the top branch were also in the top branch in the previous section, yet Haywood’s Wytty and Witless moved down to the middles of the bottom branch. The Philosopher is more closely tied to Haywood’s Preface, which points toward his involvement in the crafting of Gen&Nob / The Philosopher. At the same time, all three texts in the top branch (Philosopher, Preface, Spider and the Flie) are extremely short compared to the other texts in the corpus, and perhaps that’s why they are so far removed from the other texts. Rastell’s Foure Elements and Haywood’s Pardoner and the Frere are closely linked, then the branch builds into Calisto, then Three Laws, then Gen&Nob and finally Wytty and Witless. Here Gen&Nob is sandwiched between two of Haywood’s texts; even though it is not directly in line with any, it is in a direct line with two, plus Rastell’s.

It is interesting how separated Rastell’s and Gen&Nob are considering how close they were in the initial analysis, which seems to demonstrate not necessarily that they are extremely different now, but rather that the texts falling in between the two are now more similar to them. I’m not sure how exactly this affects authorship attribution, considering both Bale’s piece and Calisto and Melebea fall in the middle of the line as well, but Gen&Nob does appear to be more closely related to other texts than in the initial analysis. Like I mentioned earlier, I think removing some of the auxiliary verbs from my list (such as “be”, “was”, “were”) would be beneficial in the analysis, as many times they do not appear as auxiliary at all, but stand alone verbs. When examining the texts close up, many in that branch involve some sort of contemplation about the future or morality, and thus use a lot of modal auxiliary terms (“should”, “shall”, “ought”, “would”). Perhaps removal of these words from the analysis will remove the subtle genre influence that seems to link some of these texts. I think also the presence of multiples spellings (“is” and “ys”, “shulde” and “shuld”, “if” and “yf”) makes it trickier to analyze a more comprehensive list of words.

Regardless of the setbacks I noticed in my analysis, I think that the results I reached are interesting and point toward collaboration in Gen&Nob from Haywood and Rastell, or at the very least some sort of influence.  The most interesting result to me was the separation of Gen&Nob and The Philosopher, as they come from the same story so it would make sense that they would be closely linked – this suggests more than one author, validating ideas that both Haywood and Rastell contributed to the text. Again, however, I don’t want to disregard the sample size of the text, as the shorter sample involves a more specific theme with less variation (and, again, there are a lot of modal auxiliary terms in that analysis). I think an important thing to remember is what Jockers said about specific signals in a text – “though genre signals were observed, there was also the presence of other signals and no obvious way of determining which feature-usage patterns were most clearly ‘authorial’ and which ‘generic'” (70). I think being more selective about some of the words would produce different results, and then in comparing the new analysis with the previous one could determine if those words were tied to genre or simple a mixture of both. I am also interested in the initial relation between Gen&Nob and Rastell’s piece and then their separation in my cluster analysis, as well as the movement of Haywood’s texts; I wonder how they would move in future runthroughs. Overall, I am pleased with my results and look forward to seeing this project continue to develop.

Leave a Reply

Your email address will not be published. Required fields are marked *