DROPS - Document

License:

Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2016.4
URN: urn:nbn:de:0030-drops-60807
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/6080/

Go to the corresponding LIPIcs Volume Portal

Barton, Carl ; Kociumaka, Tomasz ; Pissis, Solon P. ; Radoszewski, Jakub

Efficient Index for Weighted Sequences

pdf-format:

LIPIcs-CPM-2016-4.pdf (0.5 MB)

Abstract

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold 1/z, we say that a pattern string P matches a weighted text at position i if the product of probabilities of the letters of P at positions i,...,i+|P|-1 in the text is at least 1/z. In this article, we present an O(nz)-time construction of an O(nz)-sized index that can answer pattern matching queries in a weighted text in optimal time improving upon the state of the art by a factor of z log z. Other applications of this data structure include an O(nz)-time construction of the weighted prefix table and an O(nz)-time computation of all covers of a weighted sequence, which improve upon the state of the art by the same factor.

BibTeX - Entry

@InProceedings{barton_et_al:LIPIcs:2016:6080,
  author =	{Carl Barton and Tomasz Kociumaka and Solon P. Pissis and Jakub Radoszewski},
  title =	{{Efficient Index for Weighted Sequences}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Roberto Grossi and Moshe Lewenstein},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/6080},
  URN =		{urn:nbn:de:0030-drops-60807},
  doi =		{10.4230/LIPIcs.CPM.2016.4},
  annote =	{Keywords: weighted sequence, position weight matrix, indexing, weighted suffix tree}
}

Keywords: weighted sequence, position weight matrix, indexing, weighted suffix tree

Collection: 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)

Issue Date: 2016

Date of publication: 27.06.2016

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI

Keywords:		weighted sequence, position weight matrix, indexing, weighted suffix tree
Collection:		27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Issue Date:		2016
Date of publication:		27.06.2016