License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2019.4
URN: urn:nbn:de:0030-drops-110347
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/11034/
Christensen, Sarah ;
Molloy, Erin K. ;
Vachaspati, Pranjal ;
Warnow, Tandy
TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees
Abstract
Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.
BibTeX - Entry
@InProceedings{christensen_et_al:LIPIcs:2019:11034,
author = {Sarah Christensen and Erin K. Molloy and Pranjal Vachaspati and Tandy Warnow},
title = {{TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees}},
booktitle = {19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
pages = {4:1--4:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-123-8},
ISSN = {1868-8969},
year = {2019},
volume = {143},
editor = {Katharina T. Huber and Dan Gusfield},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/11034},
URN = {urn:nbn:de:0030-drops-110347},
doi = {10.4230/LIPIcs.WABI.2019.4},
annote = {Keywords: Gene tree correction, horizontal gene transfer, incomplete lineage sorting}
}
Keywords: |
|
Gene tree correction, horizontal gene transfer, incomplete lineage sorting |
Collection: |
|
19th International Workshop on Algorithms in Bioinformatics (WABI 2019) |
Issue Date: |
|
2019 |
Date of publication: |
|
03.09.2019 |