License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2023.11
URN: urn:nbn:de:0030-drops-179650
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/17965/
Gabory, Esteban ;
Mwaniki, Moses Njagi ;
Pisanti, Nadia ;
Pissis, Solon P. ;
Radoszewski, Jakub ;
Sweering, Michelle ;
Zuba, Wiktor
Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications
Abstract
An elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as ℒ(T) = {S_1 ⋯ S_n : S_i ∈ T[i] for all i ∈ [1,n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T₁ and T₂ of lengths n₁ and n₂, cardinalities m₁ and m₂, and sizes N₁ and N₂, respectively, we show the following:
- There is no ?((N₁N₂)^{1-ε})-time algorithm, thus no ?((N₁m₂+N₂m₁)^{1-ε})-time algorithm and no ?((N₁n₂+N₂n₁)^{1-ε})-time algorithm, for any constant ε > 0, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false.
- There is no combinatorial ?((N₁+N₂)^{1.2-ε}f(n₁,n₂))-time algorithm, for any constant ε > 0 and any function f, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false.
- An ?(N₁log N₁log n₁+N₂log N₂log n₂)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T₁ and T₂ are given in a compact representation, we show that the problem is NP-complete.
- An ?(N₁m₂+N₂m₁)-time algorithm for EDSI.
- An Õ(N₁^{ω-1}n₂+N₂^{ω-1}n₁)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size.
We also show that the techniques we develop have applications outside of ED string comparison.
BibTeX - Entry
@InProceedings{gabory_et_al:LIPIcs.CPM.2023.11,
author = {Gabory, Esteban and Mwaniki, Moses Njagi and Pisanti, Nadia and Pissis, Solon P. and Radoszewski, Jakub and Sweering, Michelle and Zuba, Wiktor},
title = {{Comparing Elastic-Degenerate Strings: Algorithms, Lower Bounds, and Applications}},
booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
pages = {11:1--11:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-276-1},
ISSN = {1868-8969},
year = {2023},
volume = {259},
editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/17965},
URN = {urn:nbn:de:0030-drops-179650},
doi = {10.4230/LIPIcs.CPM.2023.11},
annote = {Keywords: elastic-degenerate string, sequence comparison, languages intersection, pangenome, acronym identification}
}
Keywords: |
|
elastic-degenerate string, sequence comparison, languages intersection, pangenome, acronym identification |
Collection: |
|
34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) |
Issue Date: |
|
2023 |
Date of publication: |
|
21.06.2023 |