Tue 18 Apr - Fri 21 Apr 2023
09:30 - 17:30

Venue: Bioinformatics Training Room

Provided by: Bioinformatics

Variant Discovery with GATK4 (IN PERSON)

This workshop will focus on the core steps involved in calling variants from Illumina next generation sequencing data using the Genome Analysis Toolkit (GATK). You will learn about best practices in calling somatic variants: single nucleotide variants (SNVs), short insertion/deletions (indels) and copy number variants (CNVs). We will also cover considerations to take when calling variants on the mitochondrial genome, as well as variant calling from bulk and single-cell RNA-seq data. We will also cover how the data structures provided by GATK can help you process large datasets in parallel and at scale. Although this workshop focuses on human data, the majority of the concepts and approaches apply to non-human data, and we will cover some adaptations needed in those situations.

The training room is located on the first floor and there is currently no wheelchair or level access available to this level.

Please note that if you are not eligible for a University of Cambridge Raven account you will need to Book or register Interest by linking here.

Target audience
  • The course is aimed at life scientists interested in genomic variant analysis and its applications. We don’t assume prior experience in this topic, but it is essential to have command line experience and a basic understanding of sequencing technologies. This course is also suitable for participants with prior experience in other types of ‘omics’ data analysis (e.g. RNA-seq), who would like to learn about this topic.
  • Graduate students, Postdocs and Staff members from the University of Cambridge, Affiliated Institutions and other external Institutions or individuals
  • Please be aware that these courses are only free for registered University of Cambridge students. All other participants will be charged a registration fee in some form. Registration fees and further details regarding the charging policy are available here.
  • After you have booked a place, if you are unable to attend any of the live sessions and would like to work in your own time, please email the Bioinfo Team as Attendance will be taken on all courses. A charge is applied for non-attendance, including for registered university students.
  • Further details regarding eligibility criteria are available here
  • Familiarity with the basic terms and concepts of genetics and short-read sequencing technologies.
  • Familiarity with the Unix command line environment is essential. We provide a self-assessment quiz, which you can use to check your suitability for this course. Otherwise, please make sure to attend our Unix course ahead of this course.
  • [optional] Familiarity with high performance computing (HPC) clusters is a bonus, as it will make your analysis after the course easier. We also offer a course on this topic.

Number of sessions: 4

# Date Time Venue Trainers
1 Tue 18 Apr   09:30 - 17:30 09:30 - 17:30 Bioinformatics Training Room map Derek Caetano-Anolles,  Mehrtash Babadi,  James Emery,  Jonn Smith
2 Wed 19 Apr   09:30 - 17:30 09:30 - 17:30 Bioinformatics Training Room map Derek Caetano-Anolles,  Mehrtash Babadi,  James Emery,  Jonn Smith
3 Thu 20 Apr   09:30 - 17:30 09:30 - 17:30 Bioinformatics Training Room map Derek Caetano-Anolles,  Mehrtash Babadi,  James Emery,  Jonn Smith
4 Fri 21 Apr   09:30 - 17:30 09:30 - 17:30 Bioinformatics Training Room map Derek Caetano-Anolles,  Mehrtash Babadi,  James Emery,  Jonn Smith
Topics covered

Bioinformatics, Data handling, Data mining, Data visualisation, Genomics, Sequence Variants


After this course you should be able to:

  • Summarise the “best practices” variant calling workflow developed by GATK and recognise the uses and importance of the different steps.
  • Become familiar with basic data structures used by GATK and how they can help scale your work for large genomes sample sizes.
  • Describe the key differences between germline and somatic variant discovery approaches.
  • Recognise the key considerations when working with mitochondrial variants.
  • Adjust pipelines for variant calling from RNA-seq data.
  • Apply GATK workflows to a range of real-world datasets.
  • Interpret analysis results and troubleshoot common problems.

During this course you will learn about:

  • Pre-processing and quality control of high-throughput sequencing data for variant calling.
  • Applying GATK’s set of tools for a range of variant discovery applications: ‘haplotype caller’ for germline variants; Mutect2 for somatic variants; ‘GermlineCNVCaller’ for copy-number variants.
  • Learn about key data structures used to store variant data, such as VCF and GenomicsDB and how to query and manipulate them.
  • Apply the new and fast DRAGEN-GATK implementation for somatic variant calling.
  • Filter, refine and evaluate your variants.
  • Adjust variant calling pipelines to work with mitochondrial variants and identify variants from RNA-seq data (bulk and single-cell).

Day 1 Topics
9:30 - 12:30
  • Welcome
  • Overview of GATK suite of tools and variant discovery pipelines
  • Introduction to sequencing data
  • Data pre-processing and quality control
12:30 - 13:30 Lunch (not provided)
13:30 - 17:30
  • Germline variant calling, joint calling and key data structures
  • Case Study GWAS
Day 2 Topics
9:30 - 12:30
  • Germline variant calling with HaplotypeCaller
  • Introduction to DRAGEN-GATK
12:30-13:30 Lunch (not provided)
  • Variant filtering and quality recalibration
  • Genotype refinement and callset evaluation
Day 3 Topics
9:30 - 12:30
  • GATK “Best Practices” workflow
  • Introduction to copy number variants (CNVs)
  • Germline CNV calling
12:30-13:30 Lunch (not provided)
  • Introduction to somatic variant discovery
  • Somatic variant calling with Mutect2
  • Somatic CNV calling
Day 4 Topics
9:30 - 12:30
  • SNP/Indel Variant Calling in Mitochondria
  • Introduction to variant calling from RNA-seq data
  • Variant calling from bulk RNA-seq data
12:30 - 13:30 Lunch (not provided)
13:30 - 17:30
  • Variant calling from single-cell RNA-seq data
  • Future directions and wrap-up
  • Tea/Coffee breaks each day: mid-morning and mid-afternoon
Registration Fees
  • Free for registered University of Cambridge students
  • £ 50/day for all University of Cambridge staff, including postdocs, temporary visitors (students and researchers) and participants from Affiliated Institutions. Please note that these charges are recovered by us at the Institutional level
  • It remains the participant's responsibility to acquire prior approval from the relevant group leader, line manager or budget holder to attend the course. It is requested that people booking only do so with the agreement of the relevant party as costs will be charged back to your Lab Head or Group Supervisor.
  • £ 50/day for all other academic participants from external Institutions and charitable organizations. These charges must be paid at registration
  • £ 100/day for all Industry participants. These charges must be paid at registration
  • Further details regarding the charging policy are available here



