skip to navigation skip to content
Instructor-led course

Provided by: Social Sciences Research Methods Centre

This course is not scheduled to run.

Register interest
Register your interest - if you would be interested in additional dates being scheduled.

Events available

Digital Data Collection: Web scraping for the Humanities and Social Sciences


Bookings for this module open on THURSDAY, 11 DECEMBER at 10:00 am
For more information see:

This module is part of the Social Science Research Methods Centre training programme which is a shared platform for providing research students with a broad range of quantitative and qualitative research methods skills that are relevant across the social sciences.

The internet is a great resource for humanities and social science data, but most information is apparently chaotic. In this course we will explore how to programmatically access information stored online, typically in html, to create neat, tabulated data ready for analysis. The course is made up of four tutorials, designed to build the tools needed to effectively collect different types of data. The uses of web scraping are diverse: in this course we will use the programming language R to first access data directly from newspapers, and secondly by accessing live data streams using APIs (YouTube, Facebook, Google Maps, Wikipedia). Collectively these sessions will give the skillsets necessary to use web scraping in students’ own research. Slides from last year’s sessions may be consulted here:

Target audience
  • Familiarity with R and an interest in online data collection. Any programming knowledge or understanding of html is a bonus
  • Students should be comfortable with the RStudio interface (R is covered in the first ten videos of Roger Peng’s course Computing for Data Analysis, available on YouTube)
  • You must have a University Information Services (Computing) Desktop Services password (
  • You must have access to CamTools

To provide students with the skillsets necessary to use web scraping in their own research.


Presentations, demonstrations and practicals


No readings are assigned, but students should ensure they are comfortable with the basics of R. This is covered in the first ten videos of Roger Peng’s course Computing for Data Analysis, available on YouTube.

  • To gain maximum benefits from the course it is important that students do not see this course in isolation from the other MPhil courses or research training they are taking.
  • Responsibility lies with each student to consider the potential for their own research using methods common in fields of the social sciences that may seem remote. Ideally this task will be facilitated by integration of the SSRMC with discipline-specific courses in their departments and through reading and discussion.

Three sessions of two hours each.


Three sessions over four weeks.

Events available