Cambridge Digital Humanities - Cambridge Digital Humanities courses

The Library as Data

Mon 15 Oct 2018 13:30 Finished

Discover the rich digital collections of Cambridge University Library and explore the methods and tools that researchers are using to analyse and visualise data.

The Library as Data: An overview

Wed 16 Oct 2019 11:00 Finished

Is the "digital library" more than a virtual rendering of the bookshelf or filing cabinet? Does the transformation of books into bytes and manuscripts into pixels change the way we create and share knowledge? This session introduces a conceptual toolkit for understanding the library collection in the digital age, and provides a guide to key methods for accessing, transforming and analysing the contents as data. Using the rich collections of Cambridge University Library as a starting point, we will explore:

Relations between digital and material texts and artefacts
Definitions of data and metadata
Methods for accessing data in bulk from digital collections
Understanding file formats and standards

The session will also provide an overview of the content in the rest of the term’s Library as Data programme, and introduce our annual call for applications to the Machine Reading the Archive Projects mentoring scheme.

The Library as Data: Digital Text Markup and TEI

Wed 23 Oct 2019 11:00 Finished

Text encoding, or the addition of semantic meaning to text, is a core activity in digital humanities, covering everything from linguistic analysis of novels to quantitative research on manuscript collections. In this session we will take a look at the fundamentals of text encoding – why we might want to do it, and why we need to think carefully about our approaches. We will also introduce the TEI (Text Encoding Initiative), the most commonly used standard for markup in the digital humanities, and look at some common research applications through examples.

The Library as Data: Exploring Digital Collections through Machine Learning

Wed 13 Nov 2019 11:00 Finished

Recent advances in machine learning are allowing computer vision and humanities researchers to develop new tools and methods for exploring digital image collections. Neural network models are now able to match, differentiate and classify images at scale in ways which would have been impossible a few years ago. This session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories, and demonstrates a range of different machine learning- based methods for exploring digital image collections. We will also discuss some of the ethical challenges of applying computer vision algorithms to cultural and historical image collections. Topics covered will include:

Unlocking image collections with the IIIF image data framework
Machine Learning: a very short introduction
Working with images at scale: ethical and methodological challenges
Applying computer vision methods to digital collections

The Library as Data: Social Network Analysis in the Correspondence Collection Archive

Wed 30 Oct 2019 11:00 Finished

Correspondence collections are a unique window into the social networks of prominent historical figures. With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which can be studied using social network analysis.

This session will introduce and demonstrate foundational concepts, methods and tools in social network analysis using datasets prepared from the Darwin Correspondence collection. Topics covered will include

Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines
Preparation and transformation of .xml files for analysis with an open source data wrangler
Rendering of network visualisations using an open source SNA tool

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the session

The Transkribus Guided Project

Wed 29 Jul 2020 16:00 Finished

We introduce the Transkribus software system that can be taught to read handwriting from images of documents and rapidly convert it into useful digital formats. This guided course provides basic training by practical immersion in this software, which requires only basic IT skills. Transkribus was developed by READ under the Horizon 2020 funding framework and is now a co-operative. It had 20,000+ users in 2019, and is becoming a standard research tool for mass transcription of archival sources. Participants will transcribe anonymised data from pre-loaded scans of forms filled out for the French national census of 1999 in Transkribus's downloadable software interface. These manual transcriptions will help train a handwritten text recognition (HTR) model to automatically transcribe many more of these forms later. In fact, the model will eventually allow the creation of one of the largest data sets ever attempted from manuscript sources. This course is a collaboration with Transkribus and Cambridge Digital Humanities. It is funded by a Cambridge Humanities Research Grant.

Using Images at Scale to Understand Environments and Behaviours

Wed 21 Nov 2018 11:30 Finished

Image big data are increasingly being used to understand the built and natural environment and to observe behaviours within it. Data sources include satellite and airborne imagery, 360 street views, and fixed video or time lapse traffic and CCTV cameras. While some of these sources are newer than others what has been changing are the quality of the images, the geographical coverage, and the potential for assessing changes over time. At the same time improvements in machine learning have made it possible to turn images into quantitative data at scale.

In this workshop we will explore the challenges that researchers face when using images at scale to understand environments and behaviours, building on work at Cambridge to estimate cycling levels, using satellite data to estimate motor vehicle volume, and planned data collection in Kenya using 360 cameras.

What can histories of artificial intelligence teach us? On the development of large models and 'data-driven' research in AI

Mon 4 Mar 2024 13:00 Finished

Join our Methods Fellow, Amira Moeding in a workshop which introduces methods of historical enquiry into the development of digital technologies and digital data. How can we do the history of technology today? What are the limits of historical enquiry; what are its strengths? Moreover, what can we learn from historical narratives about technologies? More concretely, what can the history of “Big Data” tell us about artificial intelligence today? What were, for example, seen as the pitfalls and problems with biases early on in the development of data-driven applications?

Together with you, Amira will think through and employ methods of historical enquiry and critical theory to gain a better understanding of the origin of ‘data-driven’ digital technologies. Therein, the workshop attempts to bring about both an understanding of the statistical or data-driven methods by asking how they came about and why they became attractive to whom. The workshop thus links technologies back to the interests and contexts that rendered them viable. This line of enquiry will allow us to ask what ‘technological progress’ currently is, how stories of ‘progress’ are narrated by industry actors, and what ‘risks’ become apparent from their perspective. By providing this contextualisation and recovering early interests that drove developments in artificial intelligence research and ‘Big Tech’, we will also see that progress, and the promises for the future that it holds, are not ‘objective’ or ‘necessary’ but localised in time and space. We will raise the question to what degree digital humanities cannot only use digital methods to aid the humanities, but how historical and philosophical methods can be employed to provide a basis for criticising and theorising ‘the digital’ and putting the methods so-called ‘artificial intelligences’ are based on into perspective.

Working with image collections at scale: an introduction to IIIF

Tue 11 May 2021 10:00 Finished

This CDH Basics session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories and methods of access to image collections including the collections of Cambridge University Digital Library. We will also discuss a range of methods using IIIF image data in humanities research.

All Cambridge Digital Humanities courses

Contact training provider

Privacy policy
Cookie policy

Study at Cambridge

About the University

Research at Cambridge