skip to navigation skip to content
- Select training provider - (Internal Communications)
Tue 18 Feb, Tue 25 Feb, ... Tue 11 Mar 2014
16:00 - 18:00

Venue: Titan Teaching Room 1, New Museums Site

Provided by: Social Sciences Research Methods Programme


Booking

Bookings cannot be made on this event (Programme is completed).


Other dates:

No more events



Register interest
Register your interest - if you would be interested in additional dates being scheduled.


Booking / availability

Introduction to Webscraping: Digital Data Collection for the Humanities and Social Sciences
Beginners

Tue 18 Feb, Tue 25 Feb, ... Tue 11 Mar 2014

Description

This module is part of the Social Science Research Methods Centre training programme which is a shared platform for providing research students with a broad range of quantitative and qualitative research methods skills that are relevant across the social sciences.

The internet is a great resource for humanities and social science data, but most information is apparently chaotic. In this course we will explore how to programmatically access information stored online, typically in html, to create neat, tabulated data ready for analysis. The course is made up of four tutorials which explore how to scrape different types of data. The uses of web scraping are diverse: in this course we will use the programming language R to explore how to access data from newspapers, YouTube, Wikipedia, and Twitter. Collectively these sessions will give the skillsets necessary to use web scraping in students’ own research.

Target audience
  • Mphil and PhD students from participating departments taking the Social Science Research Methods Course as part of their research degree
Prerequisites
  • Familiarity with R and an interest in online data collection. Any programming knowledge or understanding of html is a bonus
  • You must have a University Information Services (Computing) Desktop Services password (http://www.ucs.cam.ac.uk/linkpages/newcomers)
  • You must have access to CamTools
Sessions

Number of sessions: 4

# Date Time Venue Trainer
1 Tue 18 Feb 2014   16:00 - 18:00 16:00 - 18:00 Titan Teaching Room 1, New Museums Site map Rolf Fredheim
2 Tue 25 Feb 2014   16:00 - 18:00 16:00 - 18:00 Titan Teaching Room 1, New Museums Site map Rolf Fredheim
3 Tue 4 Mar 2014   16:00 - 18:00 16:00 - 18:00 Titan Teaching Room 1, New Museums Site map Rolf Fredheim
4 Tue 11 Mar 2014   16:00 - 18:00 16:00 - 18:00 Titan Teaching Room 1, New Museums Site map Rolf Fredheim
Aims

To provide students with the skillsets necessary to use web scraping in their own research.

Format

Presentations, demonstrations and practicals

Readings

No readings are assigned, but students should ensure they are comfortable with the basics of R. This is covered in the first ten videos of Roger Peng’s course Computing for Data Analysis, available on YouTube.

Notes
  • To gain maximum benefits from the course it is important that students do not see this course in isolation from the other MPhil courses or research training they are taking.
  • Responsibility lies with each student to consider the potential for their own research using methods common in fields of the social sciences that may seem remote. Ideally this task will be facilitated by integration of the SSRMC with discipline-specific courses in their departments and through reading and discussion.
Duration

Four sessions of two hours each.

Frequency

Once a week for four weeks.


Booking / availability