I heard this a lot on the weeks following the election of Trudeau last month. Political pundits tried to weigh the seat situation in Ottawa and the provinces to establish what made this period of Canadian politics unique, or just like x period in the past. Once we learned about the cool things we can do with linear algebra. I began to suspect we could answer this with data.

Here is my plan, currently a work in progress. We worked with strings where each word was a nested list within a list of the whole work. What if we looked at it a little differently. Say we made a list where each "page" was a state of Canadian parliaments, ie every year a provincial or federal election takes place, and each word was a seat. We could using the techniques we learned find some cool pieces of data.

My thought so far are that we could compare any state of Canadian politics, and find the other states closest to that state, and answer when the Canadian political scene actually did resemble the one we're studying. We can also use a TF-IDF score, and something I will try to make work called an Anti-TF-IDF score top find what about this period makes things unique. The anti-TF-IDF score would find words that are more common in the corpus than the piece analysed, finding where there are parties with more or less seats than average.

I am interested if this sounds at all interesting to you guys, and what pitfalls you might foresee. I am not an expert on Canadian politics, and so thoughts would be appreciated.

Until next time!

Comment