License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2023.5
URN: urn:nbn:de:0030-drops-186312
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18631/
Go to the corresponding LIPIcs Volume Portal


Dai, Junyan ; Rubel, Tobias ; Han, Yunheng ; Molloy, Erin K.

Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem

pdf-format:
LIPIcs-WABI-2023-5.pdf (1.0 MB)


Abstract

The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for classic parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem in polynomial time for the Dollo criterion score. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. Thus, we implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility in the context of species phylogenetics using both simulated and real data sets. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony.

BibTeX - Entry

@InProceedings{dai_et_al:LIPIcs.WABI.2023.5,
  author =	{Dai, Junyan and Rubel, Tobias and Han, Yunheng and Molloy, Erin K.},
  title =	{{Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{5:1--5:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18631},
  URN =		{urn:nbn:de:0030-drops-186312},
  doi =		{10.4230/LIPIcs.WABI.2023.5},
  annote =	{Keywords: phylogenetics, parsimony, Dollo, Camin-Sokal, dynamic programming, constraints}
}

Keywords: phylogenetics, parsimony, Dollo, Camin-Sokal, dynamic programming, constraints
Collection: 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Issue Date: 2023
Date of publication: 29.08.2023
Supplementary Material: Software (Source Code): https://github.com/molloy-lab/Dollo-CDP archived at: https://archive.softwareheritage.org/swh:1:dir:72c3b66d3db8c6efd1a391e4bd5f2d0acd4b438b
Software (Source Code and Data): https://github.com/molloy-lab/dollo-study archived at: https://archive.softwareheritage.org/swh:1:dir:a7379005ab0e3f94a3f7105f2d6fb8f2f51844c8


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI