Germline WGS
This case study considers whole-genome germline variant calling in an individual. We will use the syntheic-diploid sample CHM1-CHM13.
This tutorial is included as a Snakemake workflow. Run it with
#
PrerequisitesThis tutorial assumes a directory structure like
We'll go ahead and create this upfront:
#
Download dataFirst, download WGS CHM1-CHM13 PCR-free Illumina HiSeq-X10 reads (PRJEB13208) from EBI:
Next, we need a reference genome. In this example we'll use GRCh38 with ALT and decoys contigs (hs38DH). The simplest way to get this is with bwakit
:
We'll also need the truth variants to evaluate our calls, which we can get from the CHM-eval GitHub:
#
Map readsFirst, we need to index the reference genome for alignment:
Next, we map the raw reads to the reference genome with bwa mem
:
This should take a few hours.
#
Call variantsNow we can call variants with octopus
. Since this is single-sample germline calling, we'll be using the individual calling model. We'll also set the sequence error model to reflect the PCR-free library design of this sample and Illlumina X10 sequencer, and use random forest filtering. Finally, we restrict calling to the primary chromosomes and use the built-in multithreading:
This should complete in a few hours.
#
Evaluate variantsFinally, we will evaluate our calls with RTG Tools vcfeval
. This tool requires the reference sequence to be preprocessed:
Then evaluate the calls with vcfeval
:
We should see the following results: