Abstract
In the allpairs suffixprefix (APSP) problem, we are given a dictionary R of k strings, S_1,…,S_k, of total length n, and we are asked to find the length SPL_{i,j} of the longest string that is both a suffix of S_i and a prefix of S_j, for all i,j ∈ [1,k]. APSP is a classic problem in string algorithms with many applications in bioinformatics. When all strings of the dictionary are over an integer alphabet of size σ ≤ n^?(1), APSP can be solved in the optimal ?(n+k²) time with the use of the generalized suffix tree of the dictionary [Gusfield et al., Inf. Process. Lett. 1992].
In many bioinformatics applications, such as in sequence assembly, the size k of dictionary R is very large. In particular, k² usually dominates n, and thus the k² factor is the bottleneck both in the time and in the space complexity of such applications. We thus initiate a holistic study on several data structure variants of APSP. In particular, we consider the following types of queries:
 OnetoOne(i,j): output SPL_{i,j}.
 OnetoAll(i): output SPL_{i,j} for every j ∈ [1,k].
 Report(i,?): output all distinct j ∈ [1,k] such that SPL_{i,j} ≥ ?, where ? ≥ 0 is an integer.
 Count(i,?): output the number of distinct j ∈ [1,k] such that SPL_{i,j} ≥ ?, where ? ≥ 0 is an integer.
 Top(i,K): output K distinct j ∈ [1,k] with the highest values of SPL_{i,j} breaking ties arbitrarily.
We assume the standard word RAM model of computation with word size w = Ω(log n) and an integer alphabet of size σ ≤ n^?(1). We show the following upper bounds:
Query  Space (words)  Query time  Note
OnetoOne(i,j)  ?(n)  ?(log log k)  Theorem 11
OnetoAll(i)  ?(n)  ?(k)  Theorem 14
Report(i,?)  ?(n)  ?(log n/log log n+output)  Theorem 19(i)
Count(i,?)  ?(n)  ?(log n/log log n)  Theorem 19(ii)
Top(i,K)  ?(n)  ?(log² n/log log n+K)  Theorem 22
We also present efficient algorithms for constructing these data structures.
BibTeX  Entry
@InProceedings{loukides_et_al:LIPIcs.CPM.2023.21,
author = {Loukides, Grigorios and Pissis, Solon P. and Thankachan, Sharma V. and Zuba, Wiktor},
title = {{SuffixPrefix Queries on a Dictionary}},
booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
pages = {21:121:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772761},
ISSN = {18688969},
year = {2023},
volume = {259},
editor = {Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/17975},
URN = {urn:nbn:de:0030drops179757},
doi = {10.4230/LIPIcs.CPM.2023.21},
annote = {Keywords: allpairs suffixprefix, suffixprefix queries, internal pattern matching}
}
Keywords: 

allpairs suffixprefix, suffixprefix queries, internal pattern matching 
Collection: 

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) 
Issue Date: 

2023 
Date of publication: 

21.06.2023 