License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.STACS.2020.15
URN: urn:nbn:de:0030-drops-118769
Go to the corresponding LIPIcs Volume Portal

Lafond, Manuel ; Zhu, Binhai ; Zou, Peng

The Tandem Duplication Distance Is NP-Hard

LIPIcs-STACS-2020-15.pdf (0.5 MB)


In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ⇒ AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology.
The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.

BibTeX - Entry

  author =	{Manuel Lafond and Binhai Zhu and Peng Zou},
  title =	{{The Tandem Duplication Distance Is NP-Hard}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Christophe Paul and Markus Bl{\"a}ser},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-118769},
  doi =		{10.4230/LIPIcs.STACS.2020.15},
  annote =	{Keywords: Tandem duplication, Text processing, Formal languages, Computational genomics, FPT algorithms}

Keywords: Tandem duplication, Text processing, Formal languages, Computational genomics, FPT algorithms
Collection: 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Issue Date: 2020
Date of publication: 04.03.2020

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI