Introduction to Webscraping: Digital Data Collection for the Humanities and Social Sciences Beginners
This module is part of the Social Science Research Methods Centre training programme which is a shared platform for providing research students with a broad range of quantitative and qualitative research methods skills that are relevant across the social sciences.
The internet is a great resource for humanities and social science data, but most information is apparently chaotic. In this course we will explore how to programmatically access information stored online, typically in html, to create neat, tabulated data ready for analysis. The course is made up of four tutorials which explore how to scrape different types of data. The uses of web scraping are diverse: in this course we will use the programming language R to explore how to access data from newspapers, YouTube, Wikipedia, and Twitter. Collectively these sessions will give the skillsets necessary to use web scraping in students’ own research.
- Mphil and PhD students from participating departments taking the Social Science Research Methods Course as part of their research degree
- Familiarity with R and an interest in online data collection. Any programming knowledge or understanding of html is a bonus
- You must have a University Information Services (Computing) Desktop Services password (http://www.ucs.cam.ac.uk/linkpages/newcomers)
- You must have access to CamTools
Number of sessions: 4
# | Date | Time | Venue | Trainer | |
---|---|---|---|---|---|
1 | Tue 18 Feb 2014 16:00 - 18:00 | 16:00 - 18:00 | Titan Teaching Room 1, New Museums Site | map | Rolf Fredheim |
2 | Tue 25 Feb 2014 16:00 - 18:00 | 16:00 - 18:00 | Titan Teaching Room 1, New Museums Site | map | Rolf Fredheim |
3 | Tue 4 Mar 2014 16:00 - 18:00 | 16:00 - 18:00 | Titan Teaching Room 1, New Museums Site | map | Rolf Fredheim |
4 | Tue 11 Mar 2014 16:00 - 18:00 | 16:00 - 18:00 | Titan Teaching Room 1, New Museums Site | map | Rolf Fredheim |
To provide students with the skillsets necessary to use web scraping in their own research.
Presentations, demonstrations and practicals
No readings are assigned, but students should ensure they are comfortable with the basics of R. This is covered in the first ten videos of Roger Peng’s course Computing for Data Analysis, available on YouTube.
- To gain maximum benefits from the course it is important that students do not see this course in isolation from the other MPhil courses or research training they are taking.
- Responsibility lies with each student to consider the potential for their own research using methods common in fields of the social sciences that may seem remote. Ideally this task will be facilitated by integration of the SSRMC with discipline-specific courses in their departments and through reading and discussion.
Four sessions of two hours each.
Once a week for four weeks.
Booking / availability