License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ITP.2023.27
URN: urn:nbn:de:0030-drops-184027
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18402/
Go to the corresponding LIPIcs Volume Portal


Tan, Chengsong ; Urban, Christian

POSIX Lexing with Bitcoded Derivatives

pdf-format:
LIPIcs-ITP-2023-27.pdf (0.7 MB)


Abstract

Sulzmann and Lu describe a lexing algorithm that calculates Brzozowski derivatives using bitcodes annotated to regular expressions. Their algorithm generates POSIX values which encode the information of how a regular expression matches a string - that is, which part of the string is matched by which part of the regular expression. This information is needed in the context of lexing in order to extract and to classify tokens. The purpose of the bitcodes is to generate POSIX values incrementally while derivatives are calculated. They also help with designing an "aggressive" simplification function that keeps the size of derivatives finitely bounded. Without simplification the size of some derivatives can grow arbitrarily big, resulting in an extremely slow lexing algorithm. In this paper we describe a variant of Sulzmann and Lu’s algorithm: Our variant is a recursive functional program, whereas Sulzmann and Lu’s version involves a fixpoint construction. We (i) prove in Isabelle/HOL that our variant is correct and generates unique POSIX values (no such proof has been given for the original algorithm by Sulzmann and Lu); we also (ii) establish finite bounds for the size of our derivatives.

BibTeX - Entry

@InProceedings{tan_et_al:LIPIcs.ITP.2023.27,
  author =	{Tan, Chengsong and Urban, Christian},
  title =	{{POSIX Lexing with Bitcoded Derivatives}},
  booktitle =	{14th International Conference on Interactive Theorem Proving (ITP 2023)},
  pages =	{27:1--27:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-284-6},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{268},
  editor =	{Naumowicz, Adam and Thiemann, Ren\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18402},
  URN =		{urn:nbn:de:0030-drops-184027},
  doi =		{10.4230/LIPIcs.ITP.2023.27},
  annote =	{Keywords: POSIX matching and lexing, derivatives of regular expressions, Isabelle/HOL}
}

Keywords: POSIX matching and lexing, derivatives of regular expressions, Isabelle/HOL
Collection: 14th International Conference on Interactive Theorem Proving (ITP 2023)
Issue Date: 2023
Date of publication: 26.07.2023
Supplementary Material: Software (Source Code): https://github.com/urbanchr/posix


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI