Mar 23
3:01 PM

Thursday, March 23

Download and install TextWrangler

Check to see that you don’t already have a website
(use your own USD login, not pevans):

Open USD HTML Web Sites

Open Terminal on your Mac (you type the parts in red, again use your login instead of pevans):

usdlab$ ssh -p 42670's password: 
[pevans@usdhome ~]$ chmod 751 .

Open USD Unet File Manager

Mar 22
7:22 PM

Assignment 3

Create an HTML webpage introducing yourself (the way you would on, say, Facebook), and upload it using the USD Unet File Manager to your personal HTML webpage on (I’ll show you how to do the upload in class on Thursday, March 23). Your webpage should include at least one image, at least one external hyperlink, and use an external .css file to control font family, color, and size attributes. When you’ve done so, please publish a post on the course WordPress site pointing to your webpage. (Unlike other assignments, there is no length requirement for this WordPress post, although I’d appreciate something more than just the hyperlink itself.)

In the interest of Internet privacy, your self-introduction may introduce an entirely fictional “self”, along the lines of the Manatee character you encountered in the in-class demo. But be sure to follow the conventions of the genre: write in the first person, and embellish your character (e.g., “I’m a manatee. My name is Hugh. I live in Florida. I hang out at the power plant with all the other manatees when the weather is cold. I like to eat heads of lettuce.” You get the idea.)

Due Tue Apr 4 at 11:59 PM PDT.

Feb 22
11:47 PM

DuBois: Assignment 1


One initial observation that I made about DuBois’s Bubbleviz was that one of the most frequent words was “race.” The term became popularized to describe specific categorizes of people around the 18th century, but I found it interesting that it was used so profusely. The most prominent words were a, in, of, the, and, to (no surprise there). I was not surprised to see “negro,” but a word that indicated the turn of the century that stood out for me was “black.” Black is a more modern, less derogatory term to describe the descendants of African slaves, so I am not surprised that an African-American author integrated “black” into his writing.

Lexos 2

For my next test, I look out the following stop words: a, the, are, with, and, to, in, out, some, was, have, been, his, she, he, him, said, has, was, by, so, if, there, then, its, it, more, less, this, at, from, had, or, as, and since. My results were underwhelming; I still received top words that are relatively common. The only word that became substantially more prevalent was “negro.”

The Dendrogram polarized the African-American authors and their English counterparts, which could be due to  number of factors.  One of those could be region, considering that English dialects and language differ drastically from Europe to the United States. The Brown piece is a stark outlier, but the rest of the text generally group together based on genre, period, nationality, ethnicity, and even gender, to an extent. It just so happens that the majority of the writers our class chose were male. The female writers of the 18th century were similarly clustered together, as one would suspect. In the case of Brown, the way that his piece reads may be classified more with fiction than autobiography or non-fiction, which would explain this strange occurrence.

Next, I generated a Dendrogram that included only authors whose book titles were somehow indicative of slavery. If the title did not explicitly include the word “slave,” no matter the author, I did not include it in the clustering. As I detected in the first Dendrogram, Brown was isolated from the rest of the texts. This leads me to believe that there is indeed some type of indicator related to genre or language that is causing it to be separated from what should be similar texts.

Finally, I clustered the Brown-Anti-Slavery text with all of the 18th century female writers and found that the books clustered according to author, language, and genre, as suspected. The Brown piece was isolated, so maybe there is another unknown factor causing it to separate itself from the rest.

Feb 22
10:37 PM

Assignment 1

The author I had was William Wells Brown, a prominent African-American abolitionist, lecturer, novelist, playwright, and historian in the United States.  I analyzed 5 of his works which were all about slavery in the United States.  His most common word was “the”, followed by “and”, “to”, “is”, “he”, “of”, “in”, and “I”.  A lot of his works were about his life as a slave and his experiences as a slave in the U.S. There is also a lot of first person pronouns such as “me” and “my”.  The top 100 words did not include anything referring to slavery which was rather odd.

What I thought was interesting was that the African American writers were spaced over a period time where many important historical events occurred in the United States, such as the Civil War.  I am not sure how to interpret the results but I am curious to as why Brown has the highest bar and what the difference in colors represents.  Overall a very interesting way to look at the similarity of 18th century African American authors.

In this dendrogram, I changed the counts of the words from the Proportional Counts to the Raw Counts.  It is interesting to see that the bars have switched over to the very right side of the graph.  Brown is now at the far right of the graph as opposed to the left and still has the highest bar.


Feb 22
9:55 PM

The Language of Local Color: A Study of Zipf’s Law, Charles Chesnutt, His Word Usage, and His Similarity to Jane Austen

Charles’ Chesnutt’s writing explores social identity and societal issues in the post-Civil War South, relying on a local color style to reflect the regional dialect. His novels serve as important tools for study because his unique style reveals patterns in his word usage that support Zipf’s Law despite alternative spellings as a result of his local color style. The most common word in Chesnutt’s five fiction works and biography of Frederick Douglass is “the”. “The” is used a total of 21,140 times throughout the six works. The second most common word is “of”, which is used a total of 10,218 times. It is interesting to note that these two examples support Zipf’s law, which applies a logarithmic curve to common word usage, in that the second most common word is used half as much as the most common word.

This logarithmic curve continues generally through the first dozen words. However, the 16th most common word in all of Chesnutt’s words is “de.” “De” is how Chesnutt spells “the” in the dialogue in his work, part of his local color style in an attempt to reflect the local dialect. Similar spelling changes include “ter” for “to” and “dey” for “they.” It is interesting to note that Chesnutt’s most common “colloquial” local color word is “de” for “the,” in which the proper spelling is the most common word in his work overall. While the two spellings of the same word are not detectable to Lexos, Zipf’s Law can still be supported by the results. If we combine the two spellings, the word “the” is present more than 13,000 times throughout Chesnutt’s six works. Although prevalent throughout his works overall, “de” is not evenly distributed throughout Chesnutt’s pieces. It is most common in The Conjure Woman, a narrative which is told primarily through colloquial dialogue. In The Conjure Woman, local color is so prevalent that “de” is more commonly used than “the”. “De” is much less common in his later works of fiction, and is only used once during his biography of Frederick Douglass, his only non-fiction work.  These discrepancies reveal changes in Chesnutt’s style based on the genre and subject matter of his work.

While the word use chart Lexos provides can be useful for comparing common word usage in each of Chesnutt’s pieces, the BubbleViz tool provides a useful view of the overall word usage patterns in his work. The BubbleViz places the two most common words next to each other, allowing the viewer to see evidence of the logarithmic pattern of word usage that Zipf describes, as the “the” bubble is twice as large as the “to” bubble. While significantly smaller than its neighbors, the bubble for “de” is still visibly larger than common words such as “from” and “an”, revealing how commonly is it used in his fiction writing. On the fringes of the bubble cloud, evidence of Chesnutt’s subject matter and genre in discussion of issues of race and social identity in the post-Civil War South are evident, with “white” and “man” breaking the top 100 list. The BubbleViz also allows the viewer to see just how many alternate spellings were used of common words to provide local color. The usage of these is underrepresented in the word cloud when compared to just his fiction titles, as his Biography of Frederick Douglass contains virtually no alternate spellings, pulling the overall picture of word usage toward the traditional spellings. In conclusion, Lexos is a useful and fascinating tool when analyzing the word usage of Charles Chesnutt, revealing his distinct style while providing evidence to Zipf’s Law.

The dendogram of all the works of both the segregation-era African American writers and the 19th century English novelists reveal interesting patterns in their similarity groupings, especially in relation to what was revealed in the word analysis of Charles Chesnutt. The dendogram links four of Chesnutt’s fiction works, The Marrow of Tradition, The Wife of his Youth, The Colonel’s Dream, and The House Behind the Cedars, and places them as similar to works of other prominent African American writers of the era, including Booker T. Washington and W. E. B. Du Bois. Not surprisingly, his biography of Frederick Douglass is grouped with other African American literature detailing the lives of African Americans during and directly after slavery. However, The Conjure Woman, sits on the far side of the dendogram, sharing little connection to any of the works in our collection. In fact, The Conjure Woman has more in common with dialogue-heavy 19th century English novels than Chesnutt’s biography of Frederick Douglass. My hypothesis is that the local color dialogue used to reflect the dialect of African Americans in the post-Civil War South is what sets The Conjure Woman apart from novels on similar subjects from similar authors during the period. I believe that its emphasis on dialogue is what links it closer to Jane Austen’s dialogue-heavy novels like Sense and Sensibility than to The Marrow of Tradition. Overall, this exercise was extremely interesting, and I would like to use Lexos in the future to compare other authors in other genres and gain a better understanding of how the dendograms are made.

Feb 22
9:33 PM

Assignment 1: Ida B. Wells & 19th century African American writers

I specificaly analyzed Ida B. Wells’s literature, including The Red Record, Mob Rule, and Southern Horrors: Lynch Law in All Its Phases. This first Document Term Matrix (DTM), which does not include any stop words, displays predictable results in regards to the most frequent word usage in a novel/novels. These words include: the, and, a, of, in, to, was, that. If I compared my results to my classmates’ DTMs, the results would be incredibly similar. In addition, my results directly relate to Zipf’s Law. Zipf’s Law states the frequency of a word is inversely proportional to its ranking. It is clear that these few words (the, and, a, of, in) occur very often throughout the numerous texts while many of the others words only occur occasionally. However, in order to discover significant results, I created another DTM with these common words removed.

The results of this DTM were much more interesting than that of the first DTM. In particular, there are some obvious words that would commonly appear in an African American novel about racism in the South. These include, but are not limited to, “white”, “negro”, “race”. In addition, the high occurrence of words such as “his”, “him”, and “men” was very interesting in regards to Wells’s background. Wells was not only a leader in the Civil Rights Movement, but she was also a largely feminist writer. I would have predicted words such as “her”, “she”, and “women” would have occurred more throughout her novels instead of the male pronoun.

This is the dendrogram that includes both 18th century British authors and 19th century African American authors. It is interesting to note that all of the 18th century British authors are grouped together on the left hand side of the dendrogram. It is also interesting to note that a majority of the African American authors are grouped with their individual works, however this does not necessarily apply to all African American works. In addition, Brown’s Anti-Slavery is the only one on the branch, meaning it did not share commonalities with the other works. Overall, the results were somewhat predictable especially the grouping of British authors.

Feb 22
9:12 PM

Assignment #1

All Authors:

Exclusively Frederick Douglass:

As would be expected after conducting this experiment, some of the most frequent words that appeared in ALL texts included articles “the, a, an”, conjunctions “and, but”, and pronouns “he, his, her, she, they, their, it.”

I decided to choose the top three largest bubbles from both of my graphs and calculated the proportionalize their approximate percentages of frequency by dividing their occurrences by the number of books (For all authors I divided by 47 and for Frederick Douglass I divided by 5). The results are as follows…

All Authors:

The——-204,094 divided by 47 ======> 4,342.4

47 divided by 204,094 =.00023029

And——125,168 divided by 47 ======> 2,663.1

47 divided by 125,168=.0003755

Of——–118,406 divided by 47 ======> 2,519.3

47 divided by 118,406 =.00039694

Frederick Douglass:

The——11,791 divided by 5======> 2,358.2


And—–6,854 divided by 5 ======> 1,370.8


Of——7,552 divided by 5=======> 1,510.4


Interestingly enough, when I divided the number of books by the word frequency, Frederick Douglass’s results were almost twice as much as all the authors. However, when I divided the frequencies by the number of books, Frederick Douglass’s results were half of all the authors. I’m not sure what to attribute this conclusion to: perhaps Frederick Douglass’s total word count per book is much greater than that of the other authors so that the ratio of “the, and, of” is on a much grander scale and therefore yields a smaller ratio. I’m not entirely certain.

The results from both graphs were pretty much aligned with the expectations I had going into this project. Since all of the authors are from a similar time period with the same demographic and writing about essentially the same material it makes sense that the results would be rather consistent and reflective of each other.

Some notable observations:

The graph of all the authors has larger bubbles for the feminine pronouns and related feminine words. I attribute this to authors such as Jane Austen who writes about mostly female characters and Douglass’s lack of women in his books the cause of his rather insignificant feminine word bubble presence in his separate graph. Pronouns such as “me, my, his, I, he” and the like, take the cake for Douglass’s graph. The feminine words are almost nonexistent. The graph of all authors has the word “mrs” which Douglass’s graph lacks entirely. In addition, the words “mister” and “slave” appeared less frequently than I had originally presumed they would in these graphs due to the authors subject matters, however, their appearances were similarly frequent in both graphs, just on a smaller scale than I had anticipated.

Verb tenses were also very consistent amidst the two graphs. “Were” and “will” had similar frequencies as did “have” and “been.” A lot of the word frequencies matched up and the overlap was, as mentioned before, very consistent with my hypothesis.

Feb 22
9:03 PM

Assignment #1

In doing this assignment i was most interested in the use of pronouns. Obviously most texts of this nature are going to be filled with pronouns, but what i looked at was the difference between the sample of African American authors that was chosen for the class, and my authors. My particular group of author all wrote books about their experience of being in slavey. Whereas the rest of the texts looked at varied in subject. 

In the following BubbleViz we can see that all the texts together is mostly made up of function words, (the, and, of, to, ect.) which is overall uneventful. But looking at what pronouns are used most frequently, I, is used a little over 125 thousand times. Split between 47 different texts, it averages out to be a little under 1400 per text. 

Whereas in the books about personal experience by the authors i was looking at, I was expecting to see way more personal pronouns. However the results didn’t support my theory. Within the three texts, I, was used just under 5 thousand times, which averages out to about 1700 per text. Although this is still more than the other texts, its less than i would have imagined. 

However, words like: my, our and us don’t appear in the graph representing all the texts. I found this interesting because in a genre like the one i was looking at, the authors tell stories about a group of people and a community rather than just one single person’s story.

In the dendrogram above, there is extreme variation, where as the one below varies much less. The only way i can explain this is due to the different nature of all these authors, there are different subject and therefore difference. The three authors i looked at, Hughes, Jacobs and Steward all wrote very similar stories about their own experience.

Feb 22
6:53 PM

I broke the rules…

….So I didn’t want to do a bubble viz on just 19th century African American authors. Why? Because I had a theory. And, thankfully, my bubble viz backs it up.

I proposed that a bubble viz for 19th century African American authors would be similar to one for these authors in combination with 19th century British authors. However, I was not expecting my results.

The first attachment is the bubble viz for all authors. Here, I observed pronouns, yet the most common ones were masculine. Is this a product of the times? Can this be based on the cultures the writers are from? Will one culture prove to place importance on masculine values more than the other? Or is it the genre?


So this is a screenshot that isn’t really clear and I don’t know how to make it clear because the PNG won’t work. So here we go. But anyway, the most common words for the combination of authors were “that, has, a, the…”. All the common words we use in everyday language.

When looking at European-only authors (British, actually), I found that the most common words were pretty much the same, only 1) secondary most popular words were past tense verbs, like had and was, and 2) Mr. and his were quite popular. There was also her included.

When looking at African-American authors, masculine titles like Mr. were apparent, but there was more our and us. No us for European authors.

The dendrogram was too large for me to begin to comprehend.

Ok I don’t know how to move it this is a problem. Anyway, not surprisingly, the African-American authors were clustered together and the European authors were intermingled. At least, this is how I interpreted it.

I don’t know why this is blurry/ how to make it better.

When analyzing Booker T. Washington’s work, I noticed that 1) more inclusive nouns such as people and school were used, and 2) he uses more past tense than when analyzed alongside other 19th century African American writers, and 3) he uses Negro more when analyzed separately than alongside his contemporaries

The dendrogram groups together works in a seemingly random order. Or does it? I need to look at publication dates to compare….



Feb 22
2:18 PM

Assignment 1

I had Charles Thompson and his only work, Biography of a Slave.  The most used word was “the” with a total of 1110, which is to be expected since it is one of the most common words found in any piece of writing.  Along with “the” my BubbleViz was overcrowded with other common everyday words: to, a, of, I, from, was, etc.  One word I was excited to see was “Master.”  Last semester I took and African American Literature class and read a few books about enslavement and “master” was one of the words found in most of the pieces of works.  Before even doing my BubbleViz I was expecting to see that word, but my expectation was a lot higher than just 57.


I then tried to get a little more information on the novel (I already knew it was about slavery, but I wanted to see if I could possibly get some characters or some more in-depth information about the novel… obviously without googling it or reading any part of it,) so I applied some “stop words” while keeping the rest of the settings the same:  a, of, the, am, in, me, my, with, by, be, or, would, could, should, do, does, he, she, this, that, i, with, as, at, were, some, while, any, than, out, in, and, to, was, from, his, as , when, him, all, on, for, but, did, had.  What I came to see was interesting… the word plantation came up 59 times, a few character names shows up a few times: “ben” seems to play some part being that his name came up 76 times, Dansley was another coming up 23 times, and James coming up 24 times (Jesus came up 25 times but I wasn’t sure if he was talking about a person or the holy figure- I assumed the holy figure.)

I found the dendogram interesting.  It was cool seeing the clusters of works sharing the same frequented words.  I’m still a little confused as to why Anti-Slavery by Brown was so high while all the other works seemed to be very close together.


I then checked what the dendogram would look like as a “raw count” and I found it interesting that once again Anti-Slavery by Brown was on a whole different level than all the other works.  I thought that the works Paul added would not mix well with the works that the class got from their authors, but it’s interesting that all the works join in one way or another.  Maybe it’s just  the common words that are in all literary pieces that these works share.