Hello Internet,

This week I decided to apply the lessons I learned on analysing text to my own work. At the moment, we're building up first principles, and I can definitely see when I get better at things like pure functions, I can do some powerful stuff.

What enamoured me about our class was the discussion on an author's "fingerprint". The way we find how writers write. I decided to use these tools to do a little bit if introspection, and find if maybe I have some cool quirks in my writing.

My corpus is a collection of 5 papers I've written since I started grad school. It includes 3 seminar papers, and two MA papers. Though I have a definite research interest, these papers cover a few slightly different topics. The papers are on 9/11 reconstruction at Ground Zero, Islamophobia in America int he short 21st century, free love and sex in the hsitoriography of Emma Goldman, and a comparison of Occupy Wall Street with the Industrial Workers of the World.

Word Clouds

The first tool I played with is word clouds. The first thing I wanted to see is if I could at a glance see how my papers looked. I was rather unsurprised by the result, but it nontheless created my first infographics on my papers.

So we can see the biggest words in each essay pretty much identify the topic of the paper. Overall a happy result. I then googled the Join command in order to build a total word list and do a word cloud on the word frequencies all together.

Overall a disappointment here. Not much to glean except that my papers on 9/11 topics were the biggest.

Modal Verbs

The modal verbs didn't glean a ton of insight. Only thing is that maybe I overuse the word would. 


I decided to see what my writer's fingerprint might be with bigrams and trigrams. Comparing the 5 papers together, I found these as my common bigrams:

{{"according", "to"}, {"act", "of"}, {"and", "a"}, {"and", 
  "the"}, {"as", "a"}, {"as", "the"}, {"as", "well"}, {"at", 
  "the"}, {"because", "of"}, {"by", "the"}, {"by", 
  "their"}, {"desire", "to"}, {"does", "not"}, {"for", 
  "the"}, {"from", "a"}, {"from", "the"}, {"has", "its"}, {"in", 
  "the"}, {"into", "the"}, {"is", "a"}, {"is", "an"}, {"is", 
  "that"}, {"is", "the"}, {"it", "is"}, {"it", "to"}, {"much", 
  "more"}, {"of", "a"}, {"of", "all"}, {"of", "the"}, {"of", 
  "this"}, {"on", "a"}, {"on", "the"}, {"one", "of"}, {"over", 
  "the"}, {"rather", "than"}, {"than", "a"}, {"that", "is"}, {"that", 
  "it"}, {"that", "the"}, {"that", "this"}, {"the", "major"}, {"the", 
  "united"}, {"there", "are"}, {"they", "are"}, {"they", 
  "were"}, {"this", "is"}, {"to", "be"}, {"to", "the"}, {"to", 
  "their"}, {"tristan", "johnson"}, {"was", "the"}, {"way", 
  "to"}, {"well", "as"}, {"with", "a"}, {"would", "be"}}

Instantly popping out at me is how bad I am with passive voice. There is usually is not a good bigram to see a lot of. Desire, seems like an interesting word to then look up in my works to see why such a term would show up in so many of my writings.


My trigrams list was a very small one:

{{"as", "well", "as"}, {"one", "of", "the"}}

one of the is the most interesting ones. It seems it is my favourite ways to introducing case studies, or tangeants. Overall, this gives me a little bit of a glimpse into what my writing style is like, and if it tells me much about my own writer's psyche.