License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2020.3
URN: urn:nbn:de:0030-drops-127926
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12792/
Go to the corresponding LIPIcs Volume Portal


Rubert, Diego P. ; Martinez, Fábio V. ; Braga, Marília D. V.

Natural Family-Free Genomic Distance

pdf-format:
LIPIcs-WABI-2020-3.pdf (0.9 MB)


Abstract

A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. While the most traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families, more recently an alternative model was proposed, which, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. This model represents structural rearrangements by the generic double cut and join (DCJ) operation and is then called family-free DCJ distance. It computes the DCJ distance between the two genomes by searching for a matching of their genes based on the given pairwise similarities, therefore helping to find gene homologies. The drawback is that its computation is NP-hard. Another point is that the family-free DCJ distance must correspond to a maximal matching of the genes, due to the fact that unmatched genes are just ignored: maximizing the matching prevents the free lunch artifact of having empty or almost empty matchings giving the smaller distances.
In this paper, besides DCJ operations, we allow content-modifying operations of insertions and deletions of DNA segments and propose a new and more general family-free genomic distance. In our model we use the pairwise similarities to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes and has a search space composed of matchings of any size. We provide an efficient ILP formulation to solve it, by extending the previous formulations for computing family-based genomic distances from Shao et al. (J. Comput. Biol., 2015) and Bohnenkämper et al. (Proc. of RECOMB, 2020). Our experiments show that the ILP can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

BibTeX - Entry

@InProceedings{rubert_et_al:LIPIcs:2020:12792,
  author =	{Diego P. Rubert and F{\'a}bio V. Martinez and Mar{\'\i}lia D. V. Braga},
  title =	{{Natural Family-Free Genomic Distance}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{3:1--3:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Carl Kingsford and Nadia Pisanti},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12792},
  URN =		{urn:nbn:de:0030-drops-127926},
  doi =		{10.4230/LIPIcs.WABI.2020.3},
  annote =	{Keywords: Comparative genomics, Genome rearrangement, DCJ-indel distance}
}

Keywords: Comparative genomics, Genome rearrangement, DCJ-indel distance
Collection: 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)
Issue Date: 2020
Date of publication: 25.08.2020
Supplementary Material: The ILP for computing the natural family-free DCJ-indel distance can be downloaded from our GitLab server at https://gitlab.ub.uni-bielefeld.de/gi/gen-diff.


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI