Mar 14
2016
9:02 PM

Wikipedia edit

Monterey High editing history

My edit to this page was adding a notable alumni to the schools list of Alumni. I added Pete J. Cutino a former water polo player and swimmer at the school. I was able to discover this lack of information because I attended Monterey High and was told many stories about Pete Cutino by my own water polo coach, and while looking for pages to edit I checked my high school, Monterey High in an attempt to find any corrections or additions that could be made, leading me to find that Pete as absent from the list of notable alumni.

Pete J. Cutino is the all time winning-est college water polo coach in US history. Cutino also has an NCAA award named after him for excellence in water polo, and many other notable accomplishments; these reasons initially gave me the idea that Cutino would be a relevant addition to the list. Cutino also already has a wikipedia page noting both his accomplishments in both swimming and water polo, and his attendance at Monterey High, similarly to the other alumni listed and another reason that inspired me to add Cutino to the list. The source I cited for Cutinos attendance at Monterey High is the National Italian American Sports Hall of Fame, an online record of members of the Italian American hall of fame, which cites his induction due to his accomplishments, as well as referencing his career as a water polo player and swimmer for Monterey High.

The Addition of Cutino to the page is important because he is a sufficiently recognizable and influential figure in the history of water polo and monterey high and therefore be listed under the Notable Alumni section.

Mar 14
2016
8:58 PM

Wikipedia

After much searching, I found an article missing some important information. Originally, I was going to edit the Racket(programming language) page. However, while reading through that I discovered the Racket features page, which was a better location for the information that I felt was missing from Wikipedia regarding racket. The article concerns the features of the Racket programming language and can be found here. I created the section Plai and plai-typed. Plai is a language extension of Racket and is an important educational tool. The first citation in the section is just to theRacket’s documentation page for plai. I figured that although this is a legitimate source, it may not be exactly perfect for this assignment. So I added some additional information that I found in one of my online textbooks and cited that. There was already a sub heading called Language Extensions, so I just added a section for plai and plai-typed.

 

Mar 14
2016
7:07 PM

Wikipedia Post

For my Wikipedia post I choose to make a correction on the Long Beach Wikipedia page. Initially I had trouble finding something that I could find that needed a change. I searched my hometown and thought about people I know. With my mind wandering,  golf came into mind, which is my favorite hobby. I tend to always check up on certain people and look at their stats. This is were I got my idea to make my changes. I thought of one person in particular who has had great success in golf and wanted to look him up. With doing so I realized  that he was missing from a very important list. My correction took place on the notable persons list which was a hyperlink on the Long Beach page due to the amount of notable people Long Beach has. To the notable person page I added the name ” Patrick Cantlay”. Patrick Cantlay is a professional golfer who plays on the PGA tour.  He was the number one amateur golfer in the world in 2012, and turned pro shortly after. Its interesting that someone of his popularity would be missing in this list, as there are many other professional athletes that have not gotten nearly the amount fame or accolades as him, yet still made it on. I felt Patrick deserved to be on the notable persons list because of the many accolades he has earned from his sport. His name is now added to the list, as well as a hyperlink  that directs you to his personal wikipedia page, where it states his hometown of Long Beach, California. The source for my correction came from the official PGA tour site; http://www.pgatour.com/players/player.35450.patrick-cantlay.html. A site that states information on professional golfers such as; hometown, age, weight, and standings. Thus proving to be the most credible source for any PGA or Professional Golfer on tour, which he is. The Long Beach Wikipedia page link is;  https://en.wikipedia.org/wiki/Long_Beach,_California from here you can go to the notable person tab which will take you to the page I edited ; https://en.wikipedia.org/wiki/List_of_people_from_Long_Beach,_California.

Mar 14
2016
11:29 AM

Wiki Addition: Yoga Hosers

When I left the 2016 Sundance Film Festival I had managed to see a total of 10 new films, most of which premiered at this specific festival so I didn’t expect to find them on Wikipedia quite yet. While not all of the films I saw have current Wikipedia pages a few of them did and some were more filled out than others. When I looked at the page for the film Other People https://en.wikipedia.org/wiki/Other_People_(film) I noticed there was a section for reception in which the creator added a summary of how the film was received on certain review websites. Since there was reception on this site I decided to add a reception section for Kevin Smith’s film Yoga Hosers. https://en.wikipedia.org/wiki/Yoga_Hosers Reviews for this movie were very mixed but mostly negative, which could be why no-one found it necessary to add reception to the wikipedia page.

To update this page I looked up credible reviews of the film and looked for the same sites that the Other People page cites. However, Yoga Hosers had substantially less reviews so I only added one sentence under the sub-heading “Reception” which cites Rotten Tomatoes, a cite that is referenced on the other wikipedia film pages. In my addition to the page I created a Sub-Heading, created a reference link to the Rotten Tomatoes wikipedia page as well as cited the actual Rotten Tomatoes Yoga Hosers review page in the references. (Reference 15)

I made some changes to other wikipedia pages, including Occidental College’s athletics section. While all my updates are legit they aren’t the most formal or eloquent so I am slightly surprised that they have remained untouched.

Note: Other People was a great film and I think Netflix picked it up, so keep your eye out. Also, Yoga Hosers is wildly disturbing yet hilarious and an ode to Kevin Smith’s most famous film Clerks. So I recommend them both.

Mar 13
2016
5:24 PM

Wikipedia Change: My 2¢

What started out as a fun and curious look at my hometown through the lens of Wikipedia, turned into a need for change. Douglas City, California is a small town within Trinity County. Here is the Wikipedia link: https://en.wikipedia.org/wiki/Douglas_City,_California

I changed the order of towns (alphabetically) in the Geography section and added a clickable clink for Redding and Weaverville, directing Internet shoppers to their respective Wikipedia pages. This was a quirky source edit change. When I copy and pasted the link into Wikipedia for Redding it looked like this: [[Redding|Redding, California]]. This created an odd aesthetic to the page with Redding being listed as Redding, California (the same thing happened to Weaverville) while all the other towns just listed their names without the comma or California afterwards. I copied what the previous sources looked like for other towns and wrote the newest linked towns to read: [[Redding, California|Redding]]. I wont bore you with typing out the Weaverville source edited (it was the same as the Redding one). Anyway, it worked. Not versed in what I was doing, I took a chance to see if it would work. The knowledge that I could revert back to the previous edit took away a lot of the anxiety while doing this. I don’t know why, but it worked, and now the towns are listed in alphabetical order and without having the redundancy of California behind each town name.

I did a few other things to my hometown Wikipedia page. The first was to edit a link to the word “natural bridge”. It used to take Wikipedia goers to a picture of what natural bridges looked like. In my quest for a better-informed readership, I deleted that link and created a footnote to an informational page about this particular natural bridge (http://visittrinity.com/explore-history/natural-bridge/), which is linked to the Trinity County tourism site: www.visittrinity.com. I also edited the entire story of the massacre at the natural bridge. Simply put: it was inaccurate. It cited a book of ghost stories, written by an author who has written many books on the supernatural. I found that there was not one completely accurate source for the massacre. I decided sum up what was the same throughout multiple sources. I used the link for the natural bridge site as the link for the entire story, instead of confusing the issue with multiple links to stories that all varied in some degree. I believe my edit is an accurate combined account of the three or four texts I read. I also threw in a sarcastic remark (I couldn’t help it). Maybe the user “SporkBot” will read my changes and feel the need to edit my edits. Or, if they are currently living in my hometown, they have probably forgotten they made any edits. I sort of hope it will become an edit war, challenged by works cited and not some random book of ghosts.

The last thing I did on this page was to delete a reference to a diversion dam named the Arkansas Dam. It had nothing to do with Douglas City. The dam was placed closer to a town named Junction City, CA. I found more information about the dam on the same tourism website mentioned above (http://visittrinity.com/history/mining/?doing_wp_cron=1457908510.4739370346069335937500) and posted a much better description of the dam and its history on the Junction City Wikipedia page here: https://en.wikipedia.org/wiki/Junction_City,_California

Thanks for reading and happy Wikipedia-ing!

Feb 22
2016
11:25 PM

Dreaded Dendrogram Dreams

As a disclaimer: The first two paragraphs explore the nature of the word dendrogram and the names of things associated within a dendrogram. Please feel free to read this article in its entirety if you enjoy factoids. But my feelings will not be hurt if you jump down to the last three paragraphs to see how Lexos can be utilized on many levels and my exploration of what I found using it explicitly for a dendrogram. I would lastly like to say thank you to Wheaton College for making all of this possible.

Statistical overachievers might delight at the alliteration achieved in the title of this article and the possibility of some anagrammed value in the word dendrogram (which is not recognized by my recently updated word processor dictionary). But, what is a dendrogram? If you are an English major, you will jump to the online site for all things wordy: The Oxford English Dictionary (OED), where dendrogram is defined as: “A branched diagram representing the apparent similarity or relationship between taxa, esp. on the basis of their observed overall similarity rather than on their phylogeny.” This helps a little. I now understand it to be a diagram showing some sort of relationship to other things. I kind of got that from the ‘gram’ in dendrogram. But now what is phylogeny? Well, I jumped back into the OED and found my newest definition to be even more helpful (note sarcasm). Phylogeny is defined as: “A diagram or theoretical model of the sequence of evolutionary divergence of species or other groups of organisms from their common ancestors.” Taking that idea and transferring it to words, gives me the thought that it possibly means the evolution of a word or a divergence in the usage of a word. Luckily, I looked up taxa too: “A taxonomic group or unit, esp. when its rank in the taxonomic hierarchy is not specified.” The EOD didn’t quite fail me, but it wasn’t as helpful as I had hoped.

Luckily, I was provided a little information before diving headfirst into the dendrogram waters, so I understood what a dendrogram was before I decided to show off my lofty OED skills. Otherwise, I would have had to spend a few more hours searching for the idea of exactly what a dendrogram does. Out of curiosity, I did look through some other websites and the heavy linguistics seems prevalent to any discussion of a dendrogram. My simple breakdown of what a dendrogram can do is: compare relationships, in our case, word usage and put those relationships in a simple diagram to show how closely a work of text is related to another one. The parts of a dendrogram are named with attributes of a tree, from leaves to stand alone branches. There are no limits to how many leaves can be in a dendrogram, it just depends on how adept a user is when setting the parameters of their dendrogram. Pairs of leaves are called clades and each pairing of clades is also named a clade, while single leaves are named simplicifolious (which means single-leafed). Now that you know some words describing a dendrogram, I have made it easier to understand by letting you peek at a picture just below.

Clades

Wheaton College has dedicated a site to the Lexos tool: http://lexos.wheatoncollege.edu/upload. There is also a site dedicated to explaining many of the functions and understanding the dendrogram: http://wheatoncollege.edu/lexomics/educational-material/. Wheaton College explains what Lexos is better than I could: “Lexos is an integrated workflow of tools to facilitate the computational analyses of texts.” I won’t go into all the tools that are provided by Lexos, and I honestly couldn’t tell you about most of them. I have an understanding that there can be many things done by Lexos and I have only dipped my toes in. Without having had a small tutorial in a classroom, using Lexos would have been very cumbersome to navigate, almost impossible. Wheaton College does have some tutorials, but they explain what Lexos can do more than show a beginning user where to get started. But I’m just learning to swim in this very large pond of wondrous potential; others might be able to wade right in without fear. I can at least see it as a powerful tool for analyzing texts of all sizes. It can do a word cloud, much like the Wordle website, and it can do word bubbles, or BubbleViz, as it is named. This is pretty slick, but there are many variables that change the text and could give a user some very errant data on word usage. But it was fun to play around with.

Let us clear the table and get at the meat of this project: the dendrogram. I picked a multitude of books from authors living and writing during the mid 1800s to the early 1900s. I chose five books from Mark Twain: The Adventures of Huckleberry Finn, The Adventures of Tom Sawyer, The Innocents Abroad, Roughing It, and The Tragedy of Pudd’nhead Wilson. I chose two books from Alexandre Dumas: The Three Musketeers and The Count of Monte Cristo. The other three books were all from different authors: Kate Chopin’s The Awakening and Selected Short Fiction, Herman Melville’s Moby Dick, and Nathanial Hawthorne’s The Scarlet Letter. I chose the Dumas books to see if they would match (out of pure curiosity). I am a huge fan of Mark Twain, so I couldn’t help picking more of his books; and the chance to see if the man with the finest ear for regional dialects written, having a wide variance within a dendrogram was too good to pass up. Kate Chopin’s book had a similar dialectal pattern so I wanted to see where she fell when compared to Mark Twain. Melville and Hawthorne were great friends at one point and did what amounted to modern-day workshops on their writings, so I was again curious to see if they would be similar in word usage, even though Moby Dick is ten times the length of The Scarlet Letter.

I won’t go into the how-to of getting a dendrogram. But I think it is at least note worthy to talk about the results. I’ll abbreviate the titles to make this brief. Huck Finn and Awakening are simplicifolious (out there on their own) just as I thought they would be. Dumas is by himself and Twain’s Puddn’Head and Tom Sawyer are set for a clade of Mississippi River dialect. Twain’s other two novels were written mainly in his voice and it makes sense that they form their own clade as well. I am surprised and delighted to see Moby Dick and The Scarlet Letter in their own clade as well, maybe not proving I was right, but it gives me a warm and fuzzy feeling that two friends at least paired up in this dendrogram. This was a fun experiment that immediately showed a usefulness that would lead to some fruitful research later on down the road. I have attached a picture of this amazing dendrogram (clicking on the picture enlarges to show a much better view).  For me, this shows how much fun there can be had, while still advancing academia, even if it is a slow and careful descent into the Lexomic waters.

Screen Shot 2016-02-22 at 11.15.57 PM

Feb 22
2016
9:40 PM

Assignment 2

Lexos is a software program developed by Wheaton College in Norton, Massachusetts, it was developed for use in the field of Digital Humanities. As a tool the Lexos program can perform a handful of tasks that are quite useful for individuals involved in Digital humanities research.

The program is designed to be able to take a text, (which are easily inserted by uploading files), apply a variety of editing options, and then finally visualize and/or analyze the piece. The editing options include: scrubbing, cutting, and tokenize/count. Scrubbing allows users to alter the texts’ punctuation, case, remove certain characters, remove words (stopwords), replace words (lemmas), replace characters (consolidations), and tell the program what to do with non-standard characters (special characters e.g. ∆). Cutting appears to make it possible to split the text into smaller segments, with five options on how those segments are separated (segments can also overlap with each other). Tokenize/count is the simplest of the options; it shows how many words are in the document, compares single-word count repetition to total word count, and can alter the way lexos compares words to each other. The Visualize section of the program has four options, RollingWindow Graph (which I was unable to figure out), Word Cloud, Multicloud (makes word clouds for multiple documents), and BubbleViz which is essentially a word cloud but with the word frequency represented by bubble size, rather than word size. Lexos’ “analyze” section has 4 options, Statistics, clustering, similarity query, and Topword. Statistics gives information such as number of distinct terms and number of words occurring only once. Clustering has two options; hierarchal and K-means, these change the way that data about word choice of authors is displayed, clustering is used to show differences in word choice which can be utilized to differentiate between authors as well as compare differences in the writings, (i.e. an authors first work compared to their last work.

Lexos as software program is quite confusing to the untrained eye, many of the words used in the program such as dendrogram, referring to the graph created by clustering, tokenize, culling, etc. this limits usability to those trained in the digital humanities or instructed on how to use it, other individuals would have a very difficult time figuring out how to do use the program properly. Another complaint I have with the program is its lack of instruction on the actual program, having to leave the page to figure out what some components of the program do is a pain. Wheaton College already provides definitions and hints at what certain components do, so why not do the same for the all of the editing options and sections? The user interface is very simple to use however, so once an individual gets past the challenge of the diction used in the program it is quite easy to use. Overall, lexos is an extremely useful tool for people in digital humanities, but is both difficult to use and very specified for people not involved in the digital humanities field, I would suggest using other programs first if someone not involved in digital humanities wants to use it for reasons other than the primary purpose.

Feb 22
2016
9:14 PM

Assignment #2

Lexos is a unique tool that can be used in Digital Humanities and Computing Humanities, as well as any other person that has the need for it. As a novice user of lexos the initial interface was not hard to figure it out, but did have some tricky factors.

Lexos uses a simple approach to get started using its program.You simply choose a document that you would like to upload and drag it in to a folder. Although, for me I kept getting an error due to the fact that you cannot upload it as a word file, it has to be uploaded as a txt file. After dragging your file into the box you go through a variety of options to specify your result. The prepare tab allows you to make specification towards the final product having to do with the words that are involved. Some of the options available include; removing punctuations, making all words lowercase, and removing digits. These are all listed under the prepare tab and scrubbing. By having this option under the scrubbing tool it can offer a variation to the style of production. Another available tab is visualization, which allows you to choose the orientation and style of your word cloud. Lastly lexos allows for multiple styles of words usage diagrams such as clustering, hierarchal clustering, and top word. The different types of diagrams enables lexos user to see things in different perspectives as well as it allows lexos to appeal to a wider variety of user. For myself lexos has been a helpful tool in the field I am currently enrolled in, although I think it is mostly restricted to the field of Digital Humanities, due to the fact that it is a major key in the research for digital humanities members. Not many other work forces call for a program to analyze the usage of words. That being said it is very helpful when that is called for. In class we used lexos to discuss and analyze the works of writers. A couple of the authors included were Jane Austen and George Eliot. From our comparison we saw a reoccurring similarity in the word usage of Jane Austen and her work as a whole, as well as a similarity between Austen and George Eliot. The tool we used for this analysis was under “Analyze” and then Hierarchal Clustering. From here this produced a Dendrogram, which I had no clue what this was. The Dendrogram was a graph the used height and vertical difference to show the similarity between authors and their works. This point about the Dendrogram goes back to the fact about the usage for lexos. Without someone who is qualified for the lexos program it may be hard for normal users to understand what certain words are and what their best usages are.

Overall lexos offers a far more advanced result than its competitors because of its ability to be precise with its functions, which is positive for users who are familiar with the program and usage of lexos, but can be somewhat intimidating for beginner users.

 

Feb 22
2016
8:48 PM

Assignment #2

Lexos as a Text Mining Tool

The use of the word “tool” to describe an object implies that that object somehow makes a task or group of tasks easier. Lexos definitely makes text analysis and visualization easier from beginning to end. Lexos provides a simple user interface. The various tabs (Manage, prepare, visualize, and analyze) make navigation easy, assuming, of course, that the user understands what is meant by these terms in relation to analyzing text data. Beyond navigation, the actual functionality itself is also well laid out.

Lexos makes text analysis easier starting with preparing the text. The scrub page under the prepare tab allows the user to clearly select which document or documents to scrub. The prepare page provides basic scrubbing options as well as more advanced options such as stop words. In case the user is unfamiliar with some scrubbing methods, the helpful gray circles with a question mark provide a brief explanation of the scrubbing methods. Another helpful option is to upload a file for stop words as opposed to entering them manually. For serious text analysis, there may be a plethora of stop words. In this case, uploading a file with stop words may be a better option then manually entering them. Under the prepare tab, there is also an option to cut and an option to tokenize. Cutting allows the user to split the text into a specified number of chunks or chunks based on a milestone string. Tokenizing allows the user to separate words based on a specified delimiter. Another amazing feature of Lexos is the ability to download the Document-term Matrix as a .csv file. CSV files are great for work in excel and advanced data mining software such as Weka. According to the Lexos website, they included this feature “[to] facilitate subsequent text mining analyses beyond the scope of this site” (http://wheatoncollege.edu/lexomics/tools/). After the text is prepared through scrubbing, cutting, and/or tokenizing, Lexos provides a number of options for visualizing the text data.

Lexos can visualize the text data as a Rolling Window Graph, a word cloud, a multicloud or a bubbleviz. Lexos expands beyond the basic word cloud and allows for interesting visualization techniques. In my opinion, the bubbleviz surpasses the wordcloud as a visualization tool. The word cloud emphasizes the most prominent words too much whereas bubbleviz exaggerates the most prominent words but the less prominent words are still readable. As you visualize, Lexos allows you even more control with various options such as minimum word length. Overall, the visualization aspect of Lexos is straightforward to use and the option for various different visualizations allows the user to see the data in different ways which is generally a very good thing in terms of text analysis.

Not only does Lexos provide the ability to prepare and visualize text data but it also supports the analysis of text data. Lexos can analyze the data through statistics, clustering, SimilarityQuery, or topword. By providing the user with statistics, Lexos allows the user to focus on analyzing those statistics rather than spending time calculating them. Also, clustering is very important in data mining and Lexos allows for two types of clustering, both hierarchical and clustering with K means. SimilarityQuery is a great analytical tool because it allows for document comparison which is a major subfield of text mining.

Overall, I consider Lexos to be an effective and well rounded tool. Preparing the data, especially scrubbing, produces a more meaningful result. Visualizing the data in various ways allows for more complex analysis. The various analytical tools produce a plethora of useful information for the user. Lexos is well-organized and quite effective as a text mining analysis tool.

Feb 22
2016
11:26 AM

Lexomics Tool (assignment 2)

The Lexos tool provides an analytical resource to experiment with the frequency of word usage in a specific text. This tool is slightly confusing to the untaught eye as the terminology used isn’t necessarily common knowledge. The tool allows one to upload a text, buffer words out of the text and then look at their frequency alone. It also allows for one to upload multiple texts and compare their word usages together through a hierarchical dendrogram. (Note difficult terminology) Had I not been walked through how to set up the texts I likely would have stopped using the tool as I personally wouldn’t have understood what exactly my results were showing, so it is fair to assume that this tool was created for the professional digital humanists. The tool itself, while very interesting, doesn’t really have much use outside of digital humanities, it allows one to answer questions about the way in which an author compares with their own or others work but the average person doesn’t have that question in mind typically. So the social assumption made for who is using this tool is that they are an educated person doing a research project on specifically word usage, frequency and comparing/ contrasting various texts words.

This tool does offer much more detailed and interesting results than other tools in its realm such as Word Cloud. Word Cloud only allows you to look at one text and then see the results of most frequently used words on a spread with frequency depicted by the size of the word. Lexos doesn’t overemphasize in the same way that WordCloud does on the most frequently used words and it also provides a word count of frequency when you scroll over the word, which WordCloud does not do. Lexos also has the option of “scrubbing” out words that aren’t interesting or necessary in the research, for example one can take out all of the pronouns from a story to provide a better detail of the content rather than the subjects within. Also, differing from WordCloud, Lexos provides statistical analysis of word frequency and word usage by providing a dendrogram graph. This feature in itself makes Lexos a much more detailed oriented and research oriented site that can do more than just providing an interesting look at words in a text.

As previously stated, this tool is for those with a question or a problem to solve, not just a typical internet user. There has to be an interest in a particular text or texts and a question pertaining to it otherwise the results are rather meaningless.  In practice with the tool we analyzed the similarities between texts from Bronte, Jane Austen and George Eliot; what became interesting however is that the two authors that had more similarity were those which were from further time periods. We knew these were more similar by the way they cluster in the dendrogram. However, with further knowledge of these texts one would know that this is likely because the Eliot text was written as if the time setting was the early 1800s, which is the time of Jane Austen’s novels; which therefore provides evidence at the least that Eliot’s novel accomplished its goal in that sense.

Overall, it is arguable that the Lexos tool is much more useful than other tools like it on the internet. But at the same time, is also meant for people who have a purpose to use this tool and a knowledge of it in itself.