License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2022.7
URN: urn:nbn:de:0030-drops-161348
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/16134/
Lai, Wenfeng ;
Liyanage, Adiesha ;
Zhu, Binhai ;
Zou, Peng
Beyond the Longest Letter-Duplicated Subsequence Problem
Abstract
Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letter-duplicated subsequence is a subsequence of S in the form of x₁^{d₁}x₂^{d₂}⋯ x_k^{d_k} with x_i ∈ Σ, x_j≠ x_{j+1} and d_i ≥ 2 for all i in [k] and j in [k-1]. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when Σ is unbounded, each letter appears in S at least 6 times and all the letters in Σ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, (≤ 2,1, ≤ 3)-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence - much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in S at most 3 times, then the problem admits a factor 1.5-O(1/n) approximation. Finally, we consider the weighted version, where the weight of a block x_i^{d_i} (d_i ≥ 2) could be any positive function which might not grow with d_i. We give a non-trivial O(n²) time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of S whose weight is maximized.
BibTeX - Entry
@InProceedings{lai_et_al:LIPIcs.CPM.2022.7,
author = {Lai, Wenfeng and Liyanage, Adiesha and Zhu, Binhai and Zou, Peng},
title = {{Beyond the Longest Letter-Duplicated Subsequence Problem}},
booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
pages = {7:1--7:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-234-1},
ISSN = {1868-8969},
year = {2022},
volume = {223},
editor = {Bannai, Hideo and Holub, Jan},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16134},
URN = {urn:nbn:de:0030-drops-161348},
doi = {10.4230/LIPIcs.CPM.2022.7},
annote = {Keywords: Segmental duplications, Tandem duplications, Longest common subsequence, NP-completeness, Dynamic programming}
}
Keywords: |
|
Segmental duplications, Tandem duplications, Longest common subsequence, NP-completeness, Dynamic programming |
Collection: |
|
33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022) |
Issue Date: |
|
2022 |
Date of publication: |
|
22.06.2022 |