Skip to main content

Realigned BAMs

Octopus can generate realigned BAMs that provide visual evidence for why a call has been made. Realigned BAMs are particularly helpful for confirming complex variation where the mapper alignments are incorrect, as can be seen in the IGV pileups below



Evidence BAMs are requested using the --bamout option. The argument to --bamout changes slightly depending on whether you're calling one or more samples: If you're only calling a single sample then the argument to --bamout is a file path to write the BAM to, e.g.:

$ octopus -R hs37d5.fa -I NA12878.bam -o octopus.vcf --bamout octopus.bam

For multiple samples the argument to --bamout is a directory path, e.g.:

$ octopus -R hs37d5.fa -I NA12878.bam -o octopus.vcf --bamout minibams

Realigned BAMs with the same names as the input BAMs will be written to this directory, so this cannot be a directory where any of the input BAMs are located.

important

Realigned BAMs are only available for single-sample BAMs and when --output is specified (i.e. no stdout output).

Octopus adds several useful annotations to realigned reads:

NameDescription
HPA list (, separated) of haplotype IDs that the read is inferred to originate from. A haplotype ID, which is zero-indexed, corresponds to column in the GT field of the affiliated phased VCF. A haplotype ID indicates that the read was unambiguously assigned to the haplotype, while multiple values indicate that the read could equally well be assigned to any of the listed haplotype.
MDReference free alignments. As defined in the SAM specficiation
mdLike MD but alignment is relative to the inferred haplotype rather than the reference (i.e. mismatches are inferred sequencing errors).
hcThe CIGAR alignment to the inferred haplotype.
PSThe phase set the read was assigned to.
tip

The HP tag is useful for colouring and grouping alignments in IGV.

By default, only reads supporting regions containing called variation are realigned. However, Octopus can also copy reads overlapping regions where no variation was called using the --bamout-type FULL command. Only primary reads are used for BAM realignment.

caution

Reads are assigned and realigned to haplotypes called in the --output VCF. This means that read-pairs in different phase sets can appear discordant, and reads that are not completely spanned by a phase set (or overlap multiple phase sets) may have poor alignments. Consider trying to increase haplotype lengths if this occurs.