This past class I saw a glimpse into what the future of my PhD will look like. After reading chapter 5 of the textbook, even more so. I want to talk about what I see as the major hurdles of my research given what we've learned about scraping, and data crunching.
1. Bandwidth and Copyright
These two words are likely to haunt my dissertation work. Luckily I pay the extra cash for unlimited bandwidth, but I am already seeing that when it comes to scraping databases, I will have to work around various degrees of grumpiness in system admins. Pay walls will also plague my work. Much of my work will also still be in copyright, meaning that following the law will be harder than I will think. I should say these are better problems to have than most historians have with vast amounts of data simply not existing.
2. Cleaning data
This part I worry about. The data we worked with last night showed that things can get messy really quick, and in the wild internet even more so. I have my work cut out for me, and better get good at writing strong string search symbols.