skip to navigation skip to content
Filter by

Course type

Show only:

Dates available

Places available


Filter search

Browse or search for courses

6 matching courses
Courses per page: 10 | 25 | 50 | 100

Automated writing in the age of Machine Learning new Mon 7 Dec 2020   11:30 [Full]

Computer programmes which predict the likely next words in sentences are a familiar part of everyday life for billions of people who encounter them in auto-complete tools for search engines and the predictive keyboards used by mobile phones and word processing software. These tools rely on “language models” developed by researchers in fields such as natural language processing (NLP) and information retrieval which assign probabilities to words in a sequence based on a specific set of “training data” (in this case a collection of texts where the frequencies of word pairings or three-word phrases have been calculated in advance).

Recent developments in machine learning have led to the creation of general language models trained on extremely large datasets which can now produce ‘synthetic’ texts, answer questions, summarise information without the need for lengthy or costly processes of training for each new task. The difficulties in distinguishing the outputs of these language models from texts written by humans has provoked widespread interest in the media. Researchers have experimented with prompting GPT-3, a language model developed by OpenAI to write short stories, answer philosophical questions and apparently propose potential medical treatments -although GPT-3 did have some difficulty with the question “how many eyes does a horse have?”. Meanwhile, The Guardian ‘commissioned’ an op-ed from GPT-3.

This Methods Workshop will explore the generation of ‘synthetic’ texts through presentations, discussion and demonstrations of text generation techniques which participants will be encouraged to try out for themselves during the sessions. We will also report back from the Ghost Fictions Guided Project, organised by Cambridge Digital Humanities Learning Programme in October and November this year. The project looks at how ideas about the distinction between ‘fact’, ‘fiction’ and ‘nonfiction’ are shaping the reception of text generation methods and aims to stimulate deeper critical engagement with machine learning by humanities researchers.

Prior knowledge of programming, computer science or Machine Learning is not required. In order to try out the text generation techniques demonstrated during the course you will need access to Google Drive (accessible via Raven login for University of Cambridge users).

Bulk Data Capture: an overview new Tue 23 Feb 2021   10:00 [Places]

This CDH Basics session provides a brief introduction to different methods for capturing bulk data from online sources or via agreement with data collection holders, including Application Programme Interfaces (APIs). We will address issues of data provenance, exceptions to copyright for text and data-mining, and discuss good practice in managing and working with data that others have created.

This CDH Basics session explores how data which you have captured rather than created yourself, is likely to need cleaning up before you can use it effectively. This short session will introduce you to the basic principles of creating structured datasets and walk through some case studies in data cleaning with OpenRefine, a powerful open source tool for working with messy data.

First steps in coding with Jupyter Notebooks new Tue 9 Feb 2021   10:00 [Places]

This CDH Basics session is aimed at researchers who have never done any coding before. We will explore basic principles and approaches to writing and adapting code, using the popular programming language Python as a case study. Participants will also gain familiarity with using Jupyter Notebooks, an open-source web application which allows users to create and share documents containing live code alongside visualisations and narrative text.

Methods Workshop: TEI workshop new Mon 18 Jan 2021   10:00 [Places]

The TEI (Text Encoding Initiative is a standard for the transcription and description of text bearing objects, and is very widely used in the digital humanities – from digital editions and manuscript catalogues to text mining and linguistic analysis. This course will take you through the basics of the TEI – what it is and what it can be used for – with a particular focus on uses in research, paths to publication (both web and print) and the use of TEI documents as a dataset for analysis. There will be a chance to create some TEI yourself as well as looking at existing projects and examples. The course will take place over two sessions a week apart – with an introductory taught session, then a chance to work on TEI records yourself, followed by a review and discussion session.

This CDH Basics session will see discussion on how to assess the impact of relevant legal frameworks, including data protection, intellectual property and media law, on your digital research project and consider what approach researchers should take to the terms of service of third-party digital platforms. We will explore the challenge of informed consent in a highly-networked world and look at a range of strategies for dealing with this problem.