Methods

Alignment

MtDNA sequences are traditionally reported relative to the human reference sequence (rCRS). This format is short and convenient, however nucleotide sequence strings can be translated into more than one rCRS-coded haplotype and are therefore ambiguous. As a consequence, database searches may suffer from biased results when query and database haplotypes are aligned differently. In the forensic context that could lead to an underestimation of absolute and relative frequencies and thus to an overestimation of the statistical power of the evidence.

EMPOP uses SAM, a string-based search algorithm that converts query and database sequences into alignment-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query.

Length variant hot spots can be ignored during a search (default setting). Length variant regions were reviewed and adapted in EMPOP 3 now including coding region length variants (see below). EMPOP 3 introduces an updated query engine that considers block insertions and block deletions (indels) as a single phylogenetic event. In the CA-repeat of the control region (positions 513 and 524) only tandem indels are observed, e.g. 523del 524del. While this tandem deletion nominally constitutes two individual differences to the rCRS it is considered as single event by the new version of SAM-E. This better reflects the phylogenetic nature of the mitochondrial molecule.

The following length variant regions are considered in EMPOP 3:

Position Type Length of Ins/Del Inserted/Deleted Block Since SAM Version
16193Length Variation16193del - 16193+CCCIndividual bases19
309Length Variation309-CCCC - 309+CCTCCIndividual bases19
455Length Variation455-T - 455+CCIndividual bases19
463Length Variation463-CC - 463+CCCCCIndividual bases20
573Length Variation573-C - 573+8CIndividual bases19
960Length Variation960-C - 960+CCCCCIndividual bases20
5899Length Variation5899-C - 5899+10CIndividual bases20
8276Length Variation8276-CCCC - 8276+CCCCIndividual bases20
8285Length Variation8285-CCCC - 8285+CCCCIndividual bases20

The following events are considered by SAM-E:

Position Type Length of Ins/Del Inserted/Deleted Block Since SAM Version
16032Exceptional Insertion15TCTCTGTTCTTTCAT20
104Exceptional Insertion6CGGAGC20
105Exceptional Insertion6GGAGCA20
209Exceptional Insertion7GTGTGTT20
241Exceptional Insertion3TAA20
286Exceptional Insertion5TAACA20
291Exceptional Insertion16ACATCATAACAAAAAA20
398Exceptional Insertion14ACCAGATTTCAAAT20
470Exceptional Insertion8TACTACTA20
514Exceptional Insertion2AC20
516Exceptional Insertion2AC20
518Exceptional Insertion2AC20
520Exceptional Insertion2AC20
522Exceptional Insertion2AC20
524.2 – 524.8Exceptional Insertion2-8AC20
563Exceptional Insertion204AACAAAGAAC...AAA20
8271Exceptional Insertion9CCCCCTCTA20
8280Exceptional Insertion9CCCCCTCTA20
8289Exceptional Insertion9CCCCCTCTA20


Position Type Length of Ins/Del Inserted/Deleted Block Since SAM Version
16032Exceptional Deletion15TCTCTGTTCTTTCAT20
110Exceptional Deletion6CGGAGC20
111Exceptional Deletion6GGAGCA20
209.7Exceptional Deletion7GTGTGTT20
241.3Exceptional Deletion3TAA20
286.5Exceptional Deletion5TAACA20
291.16Exceptional Deletion16ACATCATAACAAAAAA20
398.14Exceptional Deletion14ACCAGATTTCAAAT20
478Exceptional Deletion8TACTACTA20
516Exceptional Deletion2AC20
518Exceptional Deletion2AC20
520Exceptional Deletion2AC20
522Exceptional Deletion2AC20
524.2 – 524.10Exceptional Deletion2-10AC20
563.204Exceptional Deletion204AACAAAGAAC...AAA20
8280Exceptional Deletion9CCCCCTCTA20
8289Exceptional Deletion9CCCCCTCTA20
8289.9Exceptional Deletion9CCCCCTCTA20

Haplogroup Estimation

The assignment of haplogroups to mitochondrial DNA haplotypes contributes substantial value for quality control, not only in forensic genetics but also in population and medical genetics. The availability of Phylotree, a widely accepted phylogenetic tree of human mitochondrial DNA lineages, led to the development of several (semi-)automated software solutions for haplogrouping. However, the currently existing tools only make use of haplogroup-defining mutations, whereas private mutations (beyond the haplogroup level) can be additionally informative allowing for enhanced haplogroup assignment.

EMPOP uses EMMA, an algorithm for estimating the haplogroup of mtDNA sequence based on 14,990 full mtGenomes from GenBank and 3925 virtual haplotypes from Phylotree. Further, 19,171 full control region haplotypes are used to perform a maximum likelihood estimation of the stability of mutations which is expressed as fluctuation rates.

Assuming independent positions fluctuation rates estimated by

here α, β are elements of the set A, C, G, T, – with α not equal to β, γ runs over all CR-HGs where α or β are dominant, n(x,γ) denotes the number of samples in CR-HG γ with symbol x and n(γ) denotes the total number of samples in CR-HG γ.

The algorithm compares a test profile to every database profile with an appropriate reading frame. Resulting differences are determined and assigned with appropriate costs. By ranking the total costs of the compared profiles, the algorithm is able to cluster optimal and suboptimal profiles. Note that in the output of the algorithm only base profiles with the lowest and second lowest costs are displayed.

Further Information and details can be found in Röck et al. 2013.