Cell
The cell
calling model is used to call germline and somatic variants in single cell and minibatch cell sequencing data. The model attempts to infer local phylogenies for the cells and accounts for allelic biases and dropout often observed in single cell sequencing data.
#
Usage: basicIf all of the samples are single cells and none are control cells:
#
Usage: with controlsIf the experiment includes control cells (e.g. for tumour-normals) then provide the control cell sample names (--normal-samples
; -N
):
All normal cells are assumed to originate from the root (i.e. founder) node of the phylogeny relating cells, and are therefore assumed to all have the same genotype.
#
Usage: with minibatchsIf any of the samples are derived from minibatches of cells then specify high dropout concentrations (--sample-dropout-concentration
) for these samples:
The argument for each minibatch sample may reflect the number of cells contained in the minibatch; the larger the number of cells, the larger the argument value.
The usual use for minibatch samples is for better controls, in which case the minibatches will be normal samples:
#
VCF outputThere are several annotations included in the VCF output:
Name | INFO/FORMAT | Description |
---|---|---|
SOMATIC | INFO | Indicates that a somatic mutation was inferred (i.e. the phylogeny contains more than one node). |
PY | INFO | The MAP phylogeny inferred for the variant loci. This annotation is only added for SOMATIC calls. |
PPP | INFO | Posterior probability (Phred) for the MAP phylogeny. |
PSPP | INFO | Posterior probabilities (Phred) that the local phylogeny contains 0 ,1 ,... nodes |
PNAP | FORMAT | Posterior probabilities (Phred) that this sample is assigned to node ID 0 ,1 ,.. in the MAP phylogeny (PY ). |
PY
notation#
The phylogeny is serialised using the following algorithm:
The algorithm is called with the root node of the phylogeny serialise("", ROOT)
. Examples:
#
CNV callingThe model can try to identify local copy changes (i.e. deletions or gains of haplotypes). This will result in some samples having called genotypes with different ploidies to the default ploidy. The maximum number of gains and losses is specified with the --max-copy-gain
and --max-copy-loss
options, respectively. For example, to identify up to one copy gain or loss:
Warning calling copy gains is currently computationally very expensive.
#
Performance considerationsA critical parameter for this calling model is the maximum size of the phylogeny (--max-clones
). Copy loss and gain calling are also computationally expensive.
It is recommended to allow automatic thread usage with this calling model (use --threads
option without an argument).