Theme: Ethics of Big Data

Image big data are increasingly being used to understand the built and natural environment and to observe behaviours within it. Data sources include satellite and airborne imagery, 360 street views, and fixed video or time lapse traffic and CCTV cameras. While some of these sources are newer than others what has been changing are the quality of the images, the geographical coverage, and the potential for assessing changes over time. At the same time improvements in machine learning have made it possible to turn images into quantitative data at scale.

In this workshop we will explore the challenges that researchers face when using images at scale to understand environments and behaviours, building on work at Cambridge to estimate cycling levels, using satellite data to estimate motor vehicle volume, and planned data collection in Kenya using 360 cameras.

Data Wrangling (Workshop) new Mon 4 Feb 2019   14:00 Finished

Garbage in, garbage out! Your output is as good or as bad as your input. Data collected from online sources is often dirty and messy. Discover how to clean and organise your data. After transforming raw data into a structured dataset, you will be ready to perform data analysis.

Digital Research Design, Methods and Ethics (Workshop) new Mon 21 Jan 2019   14:00 Finished

Find out how to shape a digital research project from scratch. This session will introduce the building blocks of online research design, from the several methodologies available to conduct the research to the ethical guidelines that should underpin our projects.

Digital Data Collection (Workshop) new Mon 28 Jan 2019   14:00 Finished

This session is a primer on digital data collection. The goal is to become familiar with online data sources and practices of internet-mediated data collection, including retrieving data from social media platforms.

Analysing and Visualising Social Media Data (Workshop) new Mon 11 Feb 2019   14:00 Finished

This session introduces a variety of analytical strategies, with a focus on Social Network Analysis, the most widely used and abused method for analysing and visualising digital and social media data. At the end of this session, you will be familiar with the basic concepts, techniques and measures of social network analysis.

The shelf-life of your dataset dictates the longevity of your findings. Sharing your data and assuring its integrity is a fundamental part of a digital research project. In this session we will discuss the principles of open data, channels for data dissemination and the fundamentals of data preservation.

Letters have been for centuries the main form of communication between scientists. Correspondence collections are a unique window into the social networks of prominent historical figures. What can digital social sciences and humanities reveal about the correspondence networks of 19th century scientists? This two-session intensive workshop will give participants the opportunity to explore possible answers to this question.

With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which we propose to study through social network analysis (SNA). The workshop will be divided in two sessions during which participants will “learn by doing” how to apply SNA to personal correspondence datasets. Following a guided project framework, participants will work on the correspondence collections of John Herschel and Charles Darwin. After a contextual introduction to the datasets, the sessions will focus on the basic concepts of SNA, data transformation and preparation, data visualisation and data analysis, with particular emphasis on “ego network” measures.

The two demonstration datasets used during the workshop will be provided by the Epsilon project, a research consortium between Cambridge Digital Library, The Royal Institution and The Royal Society of London aimed at building a collaborative digital framework for 19th century letters of science. The first dataset, the “Calendar of the Correspondence of Sir John Hershel Database at the Adler Planetarium”, is a collection of the personal correspondence of John Frederick William Herschel (1792-1871), a polymath celebrated for his contributions to the field of astronomy. Its curation process started in the 50s at the Royal Society and currently comprises 14.815 digitised letters encoded in extensible markup language (.xml) format. The second dataset, the “Darwin Correspondence Project” has been locating, researching, editing and publishing Charles Darwin’s letters since 1974. In addition to a 30-volume print edition, the project has also made letters available in .xml format.

The workshop will provide a step-by-step guide to analysing correspondence networks from these collections, which will cover:

- Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines; - Preparation and transformation of .xml files for analysis with an open source data wrangler; - Rendering of network visualisations using an open source SNA tool; - Analysis of the Ego Networks of John Herschel and Charles Darwin (requires UCINET)

About the speakers and course facilitators:

Anne Alexander is Director of Learning at Cambridge Digital Humanities

Hugo Leal is Methods Fellow at Cambridge Digital Humanities and Co-ordinator of the Cambridge Data School

Louisiane Ferlier is Digital Resources Manager at the Centre for the History of Science at the Royal Society. In her current role she facilitates research collaborations with the Royal Society collections, curates digital and physical exhibitions, as well as augmenting its portfolio of digital assets. A historian of ideas by training, her research investigates the material and intellectual circulation of ideas in the 17th and 18th centuries.

Elizabeth Smith is the Associate Editor for Digital Development at the Darwin Correspondence Project, where she contributed to the conversion of the Project’s work into TEI several years ago, and has since been collaborating with the technical director in enhancing the Darwin Project’s data. She is one of the co-ordinators of Epsilon, a TEI-based portal for nineteenth-century science letters.

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the workshop. Some exercises and preparation for the second session will be set during the first and participants should allow 2-3 hours for this. Please note, priority will be given to staff and students at the University of Cambridge for booking onto this workshop.

CDH Learning gratefully acknowledges the support of the Isaac Newton Trust and the Faculty of History for this workshop.

Digital Research Design and Data Ethics new Tue 24 Nov 2020   10:00 [Places]

This CDHBasics session explores the lifecycle of a digital research project across the stages of design;

  • data capture
  • transformation
  • analysis
  • presentation and preservation

it also introduces tactics for embedding ethical research principles and practices at each stage of the research process.

Qualitative Research in Online Environments new Tue 21 Jan 2020   11:30 Finished

What happens to the practice of qualitative research when interactions between researcher and research subject are largely mediated. This session will explore a wide range of topics including the challenge of consent, researcher presence and ‘lurking’ in mediated settings, how to engage with digital gatekeepers, information security for researchers, and understanding the impact of digital platform architecture on qualitative research design.

Digital Data Collection and Wrangling new Tue 14 Jan 2020   11:30 Finished

This session addresses the technical and ethical aspects of digital data collection and wrangling – two fundamental stages in the lifecycle of a digital research project. Participants will be introduced to online data sources and practices of internet-mediated data collection, including retrieving data from social media platforms. As data collected from online sources is often dirty and messy, we will also provide a short practical introduction to the process of transforming raw data into a clean and structured dataset using free and open-source software.

Data Presentation and Preservation new Tue 28 Jan 2020   11:30 Finished

The afterlife of your research data forms a vitally important part of your research project. Research funders and academic journal publishers are often strongly committed to the re-use of data and are reluctant to fund or publish research where datasets are not accessible for the purposes of peer review or further use. Yet the push for open data exists in tension with the expectations of data protection law which requires transparency from researchers about how long they will retain personal data. This session will explore good practice in data sharing and archiving as well as introducing sources of further information and advice within the University on this topic.

Social Network Analysis with Digital Data new Tue 4 Feb 2020   11:00 Finished

This course will provide a hands-on introduction to the field of Social Network Analysis, giving participants the opportunity to “learn by doing” the process of network data collection and analysis. After being introduced to the basic concepts, the participants will have the opportunity to explore all stages of a social network analysis project, including research design, essential measures, data collection and data analysis. The focus will be on the retrieval of electronic archival data (e.g. websites, digital archives and social media platforms) for non-programmers and on the production of network analysis with specialised software (e.g. Gephi). At the end, the participants will be equipped with the basic tools to perform meaningful visualisations and analyses of network data.

