License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2023.21
URN: urn:nbn:de:0030-drops-179757
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/17975/
Loukides, Grigorios ;
Pissis, Solon P. ;
Thankachan, Sharma V. ;
Zuba, Wiktor
Suffix-Prefix Queries on a Dictionary
Abstract
In the all-pairs suffix-prefix (APSP) problem, we are given a dictionary R of k strings, S_1,…,S_k, of total length n, and we are asked to find the length SPL_{i,j} of the longest string that is both a suffix of S_i and a prefix of S_j, for all i,j ∈ [1,k]. APSP is a classic problem in string algorithms with many applications in bioinformatics. When all strings of the dictionary are over an integer alphabet of size σ ≤ n^?(1), APSP can be solved in the optimal ?(n+k²) time with the use of the generalized suffix tree of the dictionary [Gusfield et al., Inf. Process. Lett. 1992].
In many bioinformatics applications, such as in sequence assembly, the size k of dictionary R is very large. In particular, k² usually dominates n, and thus the k² factor is the bottleneck both in the time and in the space complexity of such applications. We thus initiate a holistic study on several data structure variants of APSP. In particular, we consider the following types of queries:
- One-to-One(i,j): output SPL_{i,j}.
- One-to-All(i): output SPL_{i,j} for every j ∈ [1,k].
- Report(i,?): output all distinct j ∈ [1,k] such that SPL_{i,j} ≥ ?, where ? ≥ 0 is an integer.
- Count(i,?): output the number of distinct j ∈ [1,k] such that SPL_{i,j} ≥ ?, where ? ≥ 0 is an integer.
- Top(i,K): output K distinct j ∈ [1,k] with the highest values of SPL_{i,j} breaking ties arbitrarily.
We assume the standard word RAM model of computation with word size w = Ω(log n) and an integer alphabet of size σ ≤ n^?(1). We show the following upper bounds:
Query | Space (words) | Query time | Note
One-to-One(i,j) | ?(n) | ?(log log k) | Theorem 11
One-to-All(i) | ?(n) | ?(k) | Theorem 14
Report(i,?) | ?(n) | ?(log n/log log n+output) | Theorem 19(i)
Count(i,?) | ?(n) | ?(log n/log log n) | Theorem 19(ii)
Top(i,K) | ?(n) | ?(log² n/log log n+K) | Theorem 22
We also present efficient algorithms for constructing these data structures.
BibTeX - Entry
@InProceedings{loukides_et_al:LIPIcs.CPM.2023.21,
author = {Loukides, Grigorios and Pissis, Solon P. and Thankachan, Sharma V. and Zuba, Wiktor},
title = {{Suffix-Prefix Queries on a Dictionary}},
booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
pages = {21:1--21:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-276-1},
ISSN = {1868-8969},
year = {2023},
volume = {259},
editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/17975},
URN = {urn:nbn:de:0030-drops-179757},
doi = {10.4230/LIPIcs.CPM.2023.21},
annote = {Keywords: all-pairs suffix-prefix, suffix-prefix queries, internal pattern matching}
}
Keywords: |
|
all-pairs suffix-prefix, suffix-prefix queries, internal pattern matching |
Collection: |
|
34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) |
Issue Date: |
|
2023 |
Date of publication: |
|
21.06.2023 |