Mitochondrial DNA (mtDNA) haplogrouping is widely used in genetics, anthropology, forensics, and medical research. However, DNA is often degraded in forensic and human remains, generally found as tiny fragments in limited amounts, which makes haplogrouping infeasible. Current haplogrouping servers do not provide optimal approaches for tracking haplogroups (HGs) from degraded DNA samples. Here, we developed Haplotracker as a highly accurate, efficient system for tracking HGs from degraded DNA samples. Haplotracker uses short sequence fragments in the control region (CR) and searches the closest-ranked HGs using algorithms based on Phylotree Build 17 (Phylotree) definitions and our large haplotype database (n=118,869). It narrows the HGs and identifies their differential variants to verify the top-ranked HG or to track sub-haplogroups. It provides a conserved region mapping tool for PCR primer design for targeted HGs, which is essential for further tracking using degraded DNA samples. Haplotracker showed high HG prediction accuracy using 8,216 CR sequences of mtDNA genomes (mtGenomes) from Phylotree. Using Phylotree-defined HGs, Haplotracker predicted the top-ranking HGs with the highest concordance rate (56.6%, p<0.0001); MitoTool (29.4%), and HaploGrep 2 (33.9%). Further evaluation using 46,322 CR sequences from GenBank mtGenomes also resulted in significantly higher accuracy with Haplotracker. Free access: https://haplotracker.cau.ac.kr
CentOS 7, Ubuntu 18
1. HG tracking by fragment sequences (Track by Sequence)
2. HG tracking by fragment variant profiles (Track by Variant)
3. Conserved region search for primer design (Conserved Region Map)
4. HG database
5. Differentiation between HGs (HG differentiation)
6. Analysis of phantom mutants in a data set (Phantom Mutants)
8. Haplogrouping by complete mtGenome sequence (MtGenome Haplogrouping)
This is the representative tool of Haplotracker and is for HG tracking with multiple small mtDNA sequence fragments from degraded DNA samples, from which PCR amplification of large fragments is difficult. The fragmented sequences obtained from the same sample DNA can be simultaneously inputted.
Input of sequences
See the screen capture (Fig. 1a) as an example.
Sample name field: sample name is a necessary option to avoid confusion during multiple analyses. The sample name is shown on result windows as well as on the browser tab.
Rank group level (selector, 1-4): Rank group level was for restricting the lowest level of the HG rank group to be displayed. The rank group is determined by variant identity that is produced by the comparison of the sample variants with Phylotree-defined HGs (Phylotree-HGs) and their variant profiles. Users can choose a numeric value from 1 to 4. The default value is 2. In most cases, this field doesn't need to be changed in the beginning.
DNA sequence field(s): The sequence(s) can be copied from other sources and pasted into the field(s). DNA sequences should be capital letters, For multiple sequence fragments, click button + or +3 for addition and - or -3 for removal below the field. All the sequences should be originated from the same sample. Each field should contain a fragment of sequence. The IUPAC nucleotide codes can be used including 'N'. 'D' is interpreted as 'A, G, or T' not as deletion. The button Clear or Clear all empties the sequence contents only. Reset all button initializes this window. Submit button starts HG tracking processes.
Accept 'N' in sequences (selector, 'N' accept?): Default is 'No', interpreted as 'not sequenced' (no matching and no missing variants during HG tracking processes). If the DNA sequences carry 'N' and users want to consider this as 'A, C, G or T', select 'Yes'. Tandemly repeated 'N's more than four are interpreted as 'not sequenced'.
Control / coding region (buttons): This is important. If the sequence fragment is mainly of control region even with partly carrying sequences of the coding region, the button "Control" should be ticked (default). Likewise, "Coding" should be ticked if the sequence is (mainly) of the coding region.
Sequence examples (hyperlink): Example sample sequences are provided.
Try Example run (under menu Guide at homepage).
- Click the menu Track by Sequence
See the captured figure below for an example (Fig. 1c).
The sample name is shown on the window and the browser tab.
The results window shows variant information and tracked HGs.
The variant information includes fragment No., length(s), length(s), range(s), and variants obtained from (an) individual fragment(s) compared with rCRS, and all variants re-aligned based on PhyloTree Build 17 policy.
The tracked HGs (boldface) are first divided in order of rank group with their most recent common ancestor (MRCA) (the first column), which is determined by their variant identity (the third column) as mentioned above, and in each rank group, HGs are listed (the second column) in order of rank (shown in parentheses to the right of HGs) that is determined by their scores (shown in bracket left to the HG). In the figure, for example, there are twelve tracked HGs in rank group 1 and twenty-one, in rank group 2. HG W9 is top-ranked with the score value 0.192; HG W9 is predicted the highest for the given sample sequences.
Tracking for verification of top-ranked HG 'W9' above.
(Continued with the example above)
- Click the button
Group 1 at
the bottom of the current window (Tab S1●JQ245759).
The window with "Narrowed Rank group 1" is opened
in a new tab (Fig.
Interpretation of results (Fig. 1h)
You can see
five fragments were tested, as shown in the upper table.
Quality control (QC)
This result window shows further detailed information concerning score details and sample quality control, including mean of the number of missing variants in the database, mean of the number of extra variants in the database, frequency of missing variant(s), frequency of extra variant(s), reported phantom variant hot spots, and number of variants found in other HGs that can suggest the potential artificial recombination.
Score values are produced by the sum of frequency rates of missing variants and extra variants of the HG found in the built-in haplotype database (n=118,869). Results show the frequencies of missing variants and extra variants.
The server provides the following notices to evaluate sample quality control,
1) Variant identity
Variant identity (%) presents the total quality level of the sample. If it is >= 90%, good (in black, boldface); >= 80 and < 90, moderate (in blue); < 80, poor (in red)
2) The number of missing variants of top-ranked HG with its mean value of the number of missing variants found in the database.
It alarms too many missing variants (alarmed in red) when the count is over mean + SD of the number of missing variants of the HG in the database.
3) Comparison of the total number of extra variants of top-ranked HG with its mean value of the number of extra variants found in the database.
It alarms too many extra variants (alarmed in red) when the count is over mean + SD of the number of missing variants of the HG in the database
4) Reported phantom variants are shown in lime (Yao et al., 2009; Bandelt et al., 2012) or in aqua (Brandstatter et al., 2005) and rare variants not found in PhyloTree Build 17 in gray in the columns of missing and extra variant frequency of the table.
5) Artificial recombination can be suspected by checking the presence of variants in other HGs in the rightmost column of the table. The server uses the same method as the one of HaploGrep 2.
Fig. 1a. Screen capture of sequence input window.
Fig. 1b. Screen capture of Sequence examples.
Fig. 1e. Screen capture of the selection of HGs with scores.
Fig. 1f. Screen capture of the differential variants of selected HGs.
Fig. 1g. Screen capture of the second round of sequence input window.
Fig. 1h. Screen capture of the confirmation of top-ranked HG, W9 resulted from the previous tracking with control region sequences.
Fig. 1i. Screen capture of the analysis (score details and QC).
This tool is for HG tracking with "variant profiles". The variants are originally acquired from the sequence (Guide No. 1). Once "variants" and their fragment "ranges" on rCRS are in hand, "variants" instead of sequences can be used for the same purpose. Multiple variant profiles of the same sample DNA can be simultaneously inputted
Input of variants
See the screen capture (Fig. 2) as an example.
Sample name: The same as above in Guide No. 1.
Rank group level: The same as above in Guide No. 1.
Variant profile field(s): Variants in any format of PhyloTree, MitoTool, HaploGrep 2, or EMPOP can be inputted. The variant format is automatically converted to the one of PhyloTree. The variant profiles can be copied from other sources and pasted into the field(s). For multiple fragments, click button + or +3 for addition or - or -3 for removal below the field. All the fragments should be originated from the same sample. Each field should contain "variants" of a fragment of sequence. The IUPAC codes can be used, including 'N'. 'D' is interpreted as 'A, G, or T' not as deletion. The button Clear or Clear all empties the variant contents only. Reset all button initializes this window. Submit button starts HG tracking processes.
of variant profiles in different formats
Accept 'N' in sequences (checkbox): The same as above.
Enter the range of the fragment: This is important. The correct range (corresponding to positions of rCRS) of the fragment variants is important for accurate prediction.
Examples of variant profiles (hyperlink): Examples variant of profiles are provided. The link can be found just above the fragment 1 field.
Fig. 2. Screen capture of variants input window.
This tool helps users find conserved regions for the primer design to obtain additional fragments by PCR, which are required for further tracking of HGs with differential variants. The fragments of DNA across the variant positions are amplified by PCR. Well-designed primers are essential for successful PCR. Primers should be perfectly hybridized on the template DNA of the sample for the successful PCR, particularly for degraded DNAs present in a small amount. The finding of conserved regions for primer binding is one of the essential factors to be considered. Our server provides the tool to find these regions. With a given range and specified HGs, this tool searches the conserved regions in the range. Users can design the primers even with degenerated sequences relying on the information of the variants suggested by this tool, even if there are no conserved regions found under the given conditions and inevitably no other options. Under the given range with or without HG(s) restriction, this tool finds all variants present in the range and shows those graphically and the HG(s) on every variant. The conserved regions are displayed by dots.
Enter a region (np) for PCR:
(preferably < 300bp) around the variant of interest
is necessary due to the degraded DNA.
(Optional) unfold all subgroups checkbox:
If this checkbox is ticked, the server returns all subgroup details.
HG target(s) for PCR:
Enter HGs separated by
space (e.g., C Z D E G Q). Case sensitive.
The beginning position
and end position of the range are shown in brackets.
HG details at the variant positions are shown in the table.
An example of the use of this tool: (Fig. 3)
Click Conserved Region Map
in the field of 'Enter a region (np) for PCR'.
Dotted area can be used for primer design for both D and G because they are conserved.
HGs D and G have the same variant profile of the control region, as shown below (See Guide No. 5. Differentiation between HGs). A demonstration of the presence of the variant 5178A is additionally required to determine D and 5108, G. The figure below shows the conserved regions of the candidate primers for the variants 5178A and 5108. The range was set small because the sample is degraded. HGs D and G were input in the field of 'Specify haplogroup(s)' to avoid all the variant positions of the both because the HG is not determined yet.
Fig. 3. Screen capture of the result of conserved region search for primer design.
4. HG database
This tool helps users to explore an HG and its subgroups and their differential variants. A HG of question is entered in HG field, and the lowest number of subgroup level for HGs to be listed is specified. The server then returns all the HGs from the HG entered to the lowest level of its subgroups, with HG-definition by PhyloTree Build 17, HG-differential variants, HG-specific variants, highly specific variants, and non-conserved variants.
Input fields description
Subgroup level: enter the number of the lowest level of the subgroup to be seen.
Interpretation of results
See the screen
below as an example.
each HG are shown in the column of variants.
of control region are shown in blue;
HG-specific, in maroon;
highly specific, in teal;
differentiable variants, in boldface.
HG-specific variants of an HG mean that the variants are found only in the HG and if any, its subgroup(s).
Highly specific variants of an HG mean that the variants are found in the HG but are also found in a small number of other HG(s).
These variants are listed in the separated column with the number of cross-presenting HGs in brackets (the smaller, the more specific).
Non-conserved variants of an HG mean that there is(are) its subgroup(s) that do(es) not have the variants.
These variants are listed in the separated column with the number of (a) non-conserved subgroup(s) in brackets (the higher, the less conserved).
of an HG mean that the variants can be used to
differentiate between the listed
Haplogroup selection checkboxes
These are for selection options for differentiation between HG selections.
*An example of the use of this tool
The goal of this tool is to see subgroups belonging to the given
HG and HG-differential variants between the subgroups.
Click HG database
in the 'HG' field.
See Fig. 4 below.
of one of the variants 12633A, 16163,
and 16189 in the variants
of the sample
HG T1 can be checked to confirm
the HG T1 among T, T1, T2, and T3, but 12633A is preferable because
it is HG-specific.
differentiation between HGs are highly recommended
to secure highly
specific differential variants between them.
This tool is for
differentiation between HGs user inputted. It has an interface and functions
similar to the ones of the HG database tool, but multiple independent
HGs can be inputted by users.
Input field description
Interpretation of results: the same as above.
An example of the use of this tool: the same as above.
Fig. 5. Screen capture of the result of differetiation between HGs D and G.
This tool helps users
find possible phantom variants (systematic artifacts) in a data
set among the extra variants which are not used for HG
definition by PhyloTree Build 17. It follows the rules proposed by HaploGrep 2. At least two samples
are required. Before doing this analysis, haplogrouping of the samples should
be done first using the HG tracking tools of this server. Variants that are re-aligned based on PhyloTree Build 17 should be used
for this tool. The server keeps the
sample name and variants re-aligned based on PhyloTree
Multiple samples of at least two that were previously haplogrouped should be input.
The sample name and variants (re-aligned based on PhyloTree Build 17) should be delimited with the 'Tab' key. Input copied from the data in Excel is an easy option.
Example: (Fig. 6a)
Possible phantom vairants are listed following the rules proposed by HaploGrep 2; the extra variants with a Soares score <3 are considered, occurring in at least two different samples.
Example: (Fig. 6b)
Fig. 6a. Screen capture of input data for the analysis of possible phantom variants for an example.
Fig. 6b. Screen capture of the analysis of possible phantom variants for an example.
This tool is for the conversion of a variant
format to the other one between PhyloTree and haplogrouping
servers (MitoTools, HaploGrep 2, and
The variants obtained by a certain web server are valid only for it and can be used for haplogrouping instead of sequences by the same server only.
Refer to Guide No. 2 (Variants input guide for 'HG tracking by fragment variant profiles') for details.
Convert to PhyloTree
: this converts
any variant format of HaploGrep 2, EMPOP, and MitoTool to PhyloTree
Haplogrouping using complete mtGenome sequence is much
more accurate and reachable to the sub-haplogroup than using the
This tool shows major HGs, and their specific variants of PhyloTree Build 17.
PhyloTree Build 17: https://www.phylotree.org/
HaploGrep 2: https://haplogrep.i-med.ac.at/
Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23, 147.
van Oven,M. (2015) PhyloTree Build 17: growing the human mitochondrial DNA tree. Forensic Sci. Int. Genet. Suppl. Ser., 5, 392–394.Weissensteiner H, Pacher D, Kloss-Brandstätter A, Forer L, Specht G, Bandelt HJ, Kronenberg F, Salas A, Schönherr S. (2016) HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58-W63.
Parson W, Dür A. (2007) EMPOP—a forensic mtDNA database. Forensic Sci. Int. Genet. 1, 88–92.
Bandelt,H.-J., van Oven,M. and Salas,A. (2012) Haplogrouping mitochondrial DNA sequences in Legal Medicine/Forensic Genetics. Int. J. Legal Med., 126, 901–916.
Yao,Y., Salas,A., Logan,I. and Bandelt,H.-J. (2009) mtDNA Data Mining in GenBank Needs Surveying. Am. J. Hum. Genet., 85, 922–933.
Brandstätter A, Sanger T, Lutz-Bonengel S, Parson W, Beraud-Colomb E, Wen B, Kong Q-P, Bravi CM, Bandelt HJ. (2005) Phantom mutation hotspots in human mitochondrial DNA. Electrophoresis 26, 3414–3429.