New block deletion at 495-503 impact SAM2 alignment rules

We came across a novel block deletion of 9 nucleotides preceding the AC-repeat region in HVS-III. This changes the earlier SAM2 alignment in that region (see figure below) and leads to a phylogenetically more plausible alignment.

Release 12 is online, N=42839

We have updated the database with 8,227 new haplotypes from 26 different countries (including 5 updates from CR to full mitogenome).

Welcome to EMPOP 4

EMPOP now uses the new developed query engine SAM2 Huber et al 2018 that is based on SAM Röck et al 2011


Overview of EMPOP 4 and SAM 2

  • phylogenetic alignment of a mitotype in contrast to the input alignment. The phylogenetic principle follows the rules developed in Bandelt and Parson 2008 that are meanwhile generally accepted in the forensic community (SWGDAM 2013 , ISFG 2014). Thus alignment and notation of mitotypes can be standardized within the forensic genetic community and also in other fields of research.
  • estimated haplogroup status based on Phylotree nomenclature van Oven and Kayser 2009. Details are provided in Huber et al 2018
  • Updated alignment/nomenclature conventions for instable (length variant) regions 50-70, 310-316, 455-460, 961-966, 8276-8279, 16180-16193, and 16258-16262
  • catalogue of 28 block indels that are considered as single variant/difference to comply with the phylogenetic interpretation of mitotypes
  • updated catalogue of length variant regions that can be excluded from the database search
  • search function for neighbours by phylogenetic costs

Refinement of Haplogroup Assignment

The tolerance level of EMMA was downsized from 0.3 to 0.1 to allow for a finer haplogroup estimation of mtDNA haplotypes. Extensive testing on a data set comprising 86,048 haplotypes suggested that this reduction would be beneficial for forensically relevant haplogrouing. Especially haplotypes previously assigned to haplogroups H, L3 and M benefit from a more accurate classification.

Welcome to EMPOP 3

EMPOP 3 represents a new mtDNA database version based on a novel programming concept and website layout. New features are included to accommodate Massively Parallel (Next Generation) Sequencing data.

Good things that remain

  • EMPOP only holds high quality mtDNA sequence data that underwent stringent quality control
  • EMPOP uses SAM, an alignment-free search engine to guarantee that matches are found regardless of the alignment and nomenclature of query and database haplotypes
  • MtDNA haplotypes are sorted by geographic and metapopulation categories that are relevant to forensics and informative for other scientific fields

New developments in EMPOP 3

  • The earlier distinction in forensic and literature data is no more required, as the QC tools applied to every uploaded dataset proved to be effective and reliable
  • Tabular summaries of query results are presented in a more convenient format
  • Statistical evaluation of matching haplotypes was improved by offering correction for sampling bias
  • Neighbors are presented by Hamming distance and by costs (fluctuation rate)
  • Haplogroup estimates of mtDNA sequences are provided based on maximum likelihood/minimum cost functions
  • Geographical maps are provided to present the distribution of matching haplotypes and the distribution of haplogroups
  • Haplogroup Browser is a new tool to search for and display the distribution of mitochondrial haplogroups