Abstract
Let Substr_k(X) denote the set of lengthk substrings of a given string X for a given integer k > 0. We study the following basic string problem, called zShortest ?_kEquivalent Strings: Given a set ?_k of n lengthk strings and an integer z > 0, list z shortest distinct strings T₁,…,T_z such that Substr_k(T_i) = ?_k, for all i ∈ [1,z]. The zShortest ?_kEquivalent Strings problem arises naturally as an encoding problem in many realworld applications; e.g., in data privacy, in data compression, and in bioinformatics. The 1Shortest ?_kEquivalent Strings, referred to as Shortest ?_kEquivalent String, asks for a shortest string X such that Substr_k(X) = ?_k.
Our main contributions are summarized below:
 Given a directed graph G(V,E), the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved in ?̃(EV) time using an algorithm for mincost flow. We show, via a nontrivial reduction, that if Shortest ?_kEquivalent String over a binary alphabet has a nearlineartime solution then so does DCP.
 We show that the length of a shortest string output by Shortest ?_kEquivalent String is in ?(k+n²). We generalize this bound by showing that the total length of z shortest strings is in ?(zk+zn²+z²n). We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs.
 We present an algorithm for solving zShortest ?_kEquivalent Strings in ?(nk+n²log²n+zn²log n+output) time. If z = 1, the time becomes ?(nk+n²log²n) by the fact that the size of the input is Θ(nk) and the size of the output is ?(k+n²).
BibTeX  Entry
@InProceedings{bernardini_et_al:LIPIcs.CPM.2022.16,
author = {Bernardini, Giulia and Conte, Alessio and Gabory, Esteban and Grossi, Roberto and Loukides, Grigorios and Pissis, Solon P. and Punzi, Giulia and Sweering, Michelle},
title = {{On Strings Having the Same Length k Substrings}},
booktitle = {33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
pages = {16:116:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772341},
ISSN = {18688969},
year = {2022},
volume = {223},
editor = {Bannai, Hideo and Holub, Jan},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16143},
URN = {urn:nbn:de:0030drops161439},
doi = {10.4230/LIPIcs.CPM.2022.16},
annote = {Keywords: string algorithms, combinatorics on words, de Bruijn graph, Chinese Postman}
}
Keywords: 

string algorithms, combinatorics on words, de Bruijn graph, Chinese Postman 
Collection: 

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022) 
Issue Date: 

2022 
Date of publication: 

22.06.2022 