License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2017.13
URN: urn:nbn:de:0030-drops-70493
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2017/7049/
Go to the corresponding LIPIcs Volume Portal


Freydenberger, Dominik D.

A Logic for Document Spanners

pdf-format:
LIPIcs-ICDT-2017-13.pdf (0.6 MB)


Abstract

Document spanners are a formal framework for information extraction that was introduced by [Fagin, Kimelfeld, Reiss, and Vansummeren, J.ACM, 2015]. One of the central models in this framework are core spanners, which are based on regular expressions with variables that are then extended with an algebra. As shown by [Freydenberger and Holldack, ICDT, 2016], there is a connection between core spanners and EC^{reg}, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of EC^{reg} that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between this fragment and core spanners. This even holds for variants of core spanners that are based on automata instead of regular expressions. Applications of this approach include an alternative way of defining relations for spanners, insights into the relative succinctness of various classes of spanner representations, and a pumping lemma for core spanners.

BibTeX - Entry

@InProceedings{freydenberger:LIPIcs:2017:7049,
  author =	{Dominik D. Freydenberger},
  title =	{{A Logic for Document Spanners}},
  booktitle =	{20th International Conference on Database Theory (ICDT 2017)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-024-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{68},
  editor =	{Michael Benedikt and Giorgio Orsi},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/7049},
  URN =		{urn:nbn:de:0030-drops-70493},
  doi =		{10.4230/LIPIcs.ICDT.2017.13},
  annote =	{Keywords: information extraction, document spanners, word equations, regex, descriptional complexity}
}

Keywords: information extraction, document spanners, word equations, regex, descriptional complexity
Collection: 20th International Conference on Database Theory (ICDT 2017)
Issue Date: 2017
Date of publication: 17.03.2017


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI