Cell
The cell calling model is used to call germline and somatic variants in single cell and minibatch cell sequencing data. The model attempts to infer local phylogenies for the cells and accounts for allelic biases and dropout often observed in single cell sequencing data.
Usage: basic#
If all of the samples are single cells and none are control cells:
Usage: with controls#
If the experiment includes control cells (e.g. for tumour-normals) then provide the control cell sample names (--normal-samples; -N):
All normal cells are assumed to originate from the root (i.e. founder) node of the phylogeny relating cells, and are therefore assumed to all have the same genotype.
Usage: with minibatchs#
If any of the samples are derived from minibatches of cells then specify high dropout concentrations (--sample-dropout-concentration) for these samples:
The argument for each minibatch sample may reflect the number of cells contained in the minibatch; the larger the number of cells, the larger the argument value.
The usual use for minibatch samples is for better controls, in which case the minibatches will be normal samples:
VCF output#
There are several annotations included in the VCF output:
| Name | INFO/FORMAT | Description |
|---|---|---|
| SOMATIC | INFO | Indicates that a somatic mutation was inferred (i.e. the phylogeny contains more than one node). |
| PY | INFO | The MAP phylogeny inferred for the variant loci. This annotation is only added for SOMATIC calls. |
| PPP | INFO | Posterior probability (Phred) for the MAP phylogeny. |
| PSPP | INFO | Posterior probabilities (Phred) that the local phylogeny contains 0,1,... nodes |
| PNAP | FORMAT | Posterior probabilities (Phred) that this sample is assigned to node ID 0,1,.. in the MAP phylogeny (PY). |
PY notation#
The phylogeny is serialised using the following algorithm:
The algorithm is called with the root node of the phylogeny serialise("", ROOT). Examples:
CNV calling#
The model can try to identify local copy changes (i.e. deletions or gains of haplotypes). This will result in some samples having called genotypes with different ploidies to the default ploidy. The maximum number of gains and losses is specified with the --max-copy-gain and --max-copy-loss options, respectively. For example, to identify up to one copy gain or loss:
Warning calling copy gains is currently computationally very expensive.
Performance considerations#
A critical parameter for this calling model is the maximum size of the phylogeny (--max-clones). Copy loss and gain calling are also computationally expensive.
It is recommended to allow automatic thread usage with this calling model (use --threads option without an argument).