Mitochondria
This case study considers Mitochondria variant calling. We will call and benchmark variants in the mixture sample described in Fazzini et al..
This tutorial is included as a Snakemake workflow. Run it with
#
PrerequisitesThis tutorial assumes a directory structure like
We'll go ahead and create this upfront:
#
Download dataFirst, download PCR-amplified mitochondria Illumina MiSeq reads:
Next, we need a reference genome. For this tutorial we'll use GRCh38 with ALT and decoys contigs (hs38DH), which includes the revised Cambridge Reference Sequence (rCRS). The simplest way to get this is with bwakit
:
We'll also need the truth variants to evaluate our calls, Fazzini et al. provide these in Supplementary Table S7. However, since we'll need a VCF for benchmarking, we have manually translated these variants into VCF, which can be downloaded using:
#
Map readsFirst, we need to index the reference genome for alignment:
Next, we map the raw reads to the reference genome with bwa mem
:
This should complete in less than a minute.
#
Call variantsNow we can call variants with octopus
. Since we're aiming to call homoplasmic and heteroplasmic mitochondria variants, we'll be using the polyclone calling model. We'll use the provided mitochondria config which sets several options. We'll also set the sequence error model to reflect the PCR amplified library design of this sample. Finally, we restrict calling to the mitochondria reference genome and use built-in multithreading:
This should complete in ~30 minutes.
#
Evaluate variantsFinally, we will evaluate our calls with RTG Tools vcfeval
. This tool requires the reference sequence to be preprocessed:
Then evaluate the calls with vcfeval
:
We should see the following results: