Cambridge Digital Humanities - The Library as Data

The Library as Data: Digital Text Markup and TEI

Wed 23 Oct 2019 11:00 Finished

Text encoding, or the addition of semantic meaning to text, is a core activity in digital humanities, covering everything from linguistic analysis of novels to quantitative research on manuscript collections. In this session we will take a look at the fundamentals of text encoding – why we might want to do it, and why we need to think carefully about our approaches. We will also introduce the TEI (Text Encoding Initiative), the most commonly used standard for markup in the digital humanities, and look at some common research applications through examples.

The Library as Data: Social Network Analysis in the Correspondence Collection Archive

Wed 30 Oct 2019 11:00 Finished

Correspondence collections are a unique window into the social networks of prominent historical figures. With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which can be studied using social network analysis.

This session will introduce and demonstrate foundational concepts, methods and tools in social network analysis using datasets prepared from the Darwin Correspondence collection. Topics covered will include

Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines
Preparation and transformation of .xml files for analysis with an open source data wrangler
Rendering of network visualisations using an open source SNA tool

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the session

The Library as Data: An overview

Wed 16 Oct 2019 11:00 Finished

Is the "digital library" more than a virtual rendering of the bookshelf or filing cabinet? Does the transformation of books into bytes and manuscripts into pixels change the way we create and share knowledge? This session introduces a conceptual toolkit for understanding the library collection in the digital age, and provides a guide to key methods for accessing, transforming and analysing the contents as data. Using the rich collections of Cambridge University Library as a starting point, we will explore:

Relations between digital and material texts and artefacts
Definitions of data and metadata
Methods for accessing data in bulk from digital collections
Understanding file formats and standards

The session will also provide an overview of the content in the rest of the term’s Library as Data programme, and introduce our annual call for applications to the Machine Reading the Archive Projects mentoring scheme.

Introduction to Archival Photography workshop [cancelled re Covid-19]

Wed 10 Jun 2020 11:00 CANCELLED

We are currently reformatting our Learning programme for remote teaching; this will require some rescheduling so bookings will reopen and new sessions will be created for online courses as soon as possible. In the interim we would encourage you to register your interest so as to be notified of the new schedule. Please be aware that we hope to run many of our courses online, but that this is dependent on staff availability and resources so please be aware we may have to postpone or cancel some sessions

This session focusses on providing photography skills for those undertaking archival research. Dr Oliver Dunn has experience spanning more than 10 years digitising written and printed historical sources for major university research projects in the humanities and social sciences. The focus is very much on low-tech approaches and small budgets. We’ll consider best uses of smartphones, digital cameras and tripods.

The Library as Data: Exploring Digital Collections through Machine Learning

Wed 13 Nov 2019 11:00 Finished

Recent advances in machine learning are allowing computer vision and humanities researchers to develop new tools and methods for exploring digital image collections. Neural network models are now able to match, differentiate and classify images at scale in ways which would have been impossible a few years ago. This session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories, and demonstrates a range of different machine learning- based methods for exploring digital image collections. We will also discuss some of the ethical challenges of applying computer vision algorithms to cultural and historical image collections. Topics covered will include:

Unlocking image collections with the IIIF image data framework
Machine Learning: a very short introduction
Working with images at scale: ethical and methodological challenges
Applying computer vision methods to digital collections

Theme: The Library as Data

Contact training provider

Privacy policy
Cookie policy

Study at Cambridge

About the University

Research at Cambridge