DROPS - Document

License:

Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2020.13
URN: urn:nbn:de:0030-drops-121388
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12138/

Go to the corresponding LIPIcs Volume Portal

Ganguly, Arnab ; Gibney, Daniel ; Hooshmand, Sahar ; Külekci, M. Oğuzhan ; Thankachan, Sharma V.

FM-Index Reveals the Reverse Suffix Array

pdf-format:

LIPIcs-CPM-2020-13.pdf (0.9 MB)

Abstract

Given a text T[1,n] over an alphabet Σ of size σ, the suffix array of T stores the lexicographic order of the suffixes of T. The suffix array needs Θ(nlog n) bits of space compared to the n log σ bits needed to store T itself. A major breakthrough [FM - Index, FOCS'00] in the last two decades has been encoding the suffix array in near-optimal number of bits (≈ log σ bits per character). One can decode a suffix array value using the FM-Index in log^{O(1)} n time.
We study an extension of the problem in which we have to also decode the suffix array values of the reverse text. This problem has numerous applications such as in approximate pattern matching [Lam et al., BIBM' 09]. Known approaches maintain the FM - Index of both the forward and the reverse text which drives up the space occupancy to 2nlog σ bits (plus lower order terms). This brings in the natural question of whether we can decode the suffix array values of both the forward and the reverse text, but by using nlog σ bits (plus lower order terms). We answer this question positively, and show that given the FM - Index of the forward text, we can decode the suffix array value of the reverse text in near logarithmic average time. Additionally, our experimental results are competitive when compared to the standard approach of maintaining the FM - Index for both the forward and the reverse text. We believe that applications that require both the forward and reverse text will benefit from our approach.

BibTeX - Entry

@InProceedings{ganguly_et_al:LIPIcs:2020:12138,
  author =	{Arnab Ganguly and Daniel Gibney and Sahar Hooshmand and M. Oğuzhan K{\"u}lekci and Sharma V. Thankachan},
  title =	{{FM-Index Reveals the Reverse Suffix Array}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{13:1--13:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{Inge Li G{\o}rtz and Oren Weimann},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/12138},
  URN =		{urn:nbn:de:0030-drops-121388},
  doi =		{10.4230/LIPIcs.CPM.2020.13},
  annote =	{Keywords: Data Structures, Suffix Trees, String Algorithms, Compression, Burrows - Wheeler transform, FM-Index}
}

Keywords: Data Structures, Suffix Trees, String Algorithms, Compression, Burrows - Wheeler transform, FM-Index

Collection: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Issue Date: 2020

Date of publication: 09.06.2020

Supplementary Material: https://github.com/oguzhankulekci/reverseSA

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI

Keywords:		Data Structures, Suffix Trees, String Algorithms, Compression, Burrows - Wheeler transform, FM-Index
Collection:		31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Issue Date:		2020
Date of publication:		09.06.2020
Supplementary Material:		https://github.com/oguzhankulekci/reverseSA