skip to navigation skip to content
Reset
Filter by

Course type

Show only:



Dates available




Places available



Filter search

Browse or search for courses


1 matching course
Courses per page: 10 | 25 | 50 | 100
Clear search


The internet is a great resource for humanities and social science data, but most information is apparently chaotic. In this course we will explore how to programmatically access information stored online, typically in html, to create neat, tabulated data ready for analysis. The uses of web scraping are diverse: previous versions of this course used the the programming language R to access data directly from newspapers, and by accessing live data streams using APIs (YouTube, Facebook, Google Maps, Wikipedia). The one-day course is structured as follows: in the morning, we will consider general principles of webscraping, illustrated through examples. This session is designed to create a toolkit needed to effectively collect different types of online data. Then in the afternoon the session will take a workshop format, where students may chose to begin applying web scraping to their their own research, or work through a structured set of exercises. If there are any particular data sources you are interested in accessing, do email me at dt444@cam.ac.uk, as I may be able to integrate an example directly relevant to your research into the session.

Different from past years, this course will be taught using Python, Jupyter Notebooks and the BeautifulSoup library. The course will not assume any prior knowledge of Python, but students are encouraged to learn a bit of the tools before the course. Any introductory MOOC course on Python (such as edx or Cursera) will provide an excellent introduction.