License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2021.101
URN: urn:nbn:de:0030-drops-141702
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/14170/
Go to the corresponding LIPIcs Volume Portal


Nishimoto, Takaaki ; Tabei, Yasuo

Optimal-Time Queries on BWT-Runs Compressed Indexes

pdf-format:
LIPIcs-ICALP-2021-101.pdf (0.8 MB)


Abstract

Indexing highly repetitive strings (i.e., strings with many repetitions) for fast queries has become a central research topic in string processing, because it has a wide variety of applications in bioinformatics and natural language processing. Although a substantial number of indexes for highly repetitive strings have been proposed thus far, developing compressed indexes that support various queries remains a challenge. The run-length Burrows-Wheeler transform (RLBWT) is a lossless data compression by a reversible permutation of an input string and run-length encoding, and it has received interest for indexing highly repetitive strings. LF and ϕ^{-1} are two key functions for building indexes on RLBWT, and the best previous result computes LF and ϕ^{-1} in O(log log n) time with O(r) words of space for the string length n and the number r of runs in RLBWT. In this paper, we improve LF and ϕ^{-1} so that they can be computed in a constant time with O(r) words of space. Subsequently, we present OptBWTR (optimal-time queries on BWT-runs compressed indexes), the first string index that supports various queries including locate, count, extract queries in optimal time and O(r) words of space.

BibTeX - Entry

@InProceedings{nishimoto_et_al:LIPIcs.ICALP.2021.101,
  author =	{Nishimoto, Takaaki and Tabei, Yasuo},
  title =	{{Optimal-Time Queries on BWT-Runs Compressed Indexes}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{101:1--101:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/14170},
  URN =		{urn:nbn:de:0030-drops-141702},
  doi =		{10.4230/LIPIcs.ICALP.2021.101},
  annote =	{Keywords: Compressed text indexes, Burrows-Wheeler transform, highly repetitive text collections}
}

Keywords: Compressed text indexes, Burrows-Wheeler transform, highly repetitive text collections
Collection: 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Issue Date: 2021
Date of publication: 02.07.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI