License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2023.6
URN: urn:nbn:de:0030-drops-179602
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/17960/
Go to the corresponding LIPIcs Volume Portal


Bourhis, Pierre ; Boussidan, Aaron ; Gambette, Philippe

On Distances Between Words with Parameters

pdf-format:
LIPIcs-CPM-2023-6.pdf (0.9 MB)


Abstract

The edit distance between parameterized words is a generalization of the classical edit distance where it is allowed to map particular letters of the first word, called parameters, to parameters of the second word before computing the distance. This problem has been introduced in particular for detection of code duplication, and the notion of words with parameters has also been used with different semantics in other fields. The complexity of several variants of edit distances between parameterized words has been studied, however, the complexity of the most natural one, the Levenshtein distance, remained open.
In this paper, we solve this open question and close the exhaustive analysis of all cases of parameterized word matching and function matching, showing that these problems are np-complete. To this aim, we also provide a comparison of the different problems, exhibiting several equivalences between them. We also provide and implement a MaxSAT encoding of the problem, as well as a simple FPT algorithm in the alphabet size, and study their efficiency on real data in the context of theater play structure comparison.

BibTeX - Entry

@InProceedings{bourhis_et_al:LIPIcs.CPM.2023.6,
  author =	{Bourhis, Pierre and Boussidan, Aaron and Gambette, Philippe},
  title =	{{On Distances Between Words with Parameters}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{6:1--6:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/17960},
  URN =		{urn:nbn:de:0030-drops-179602},
  doi =		{10.4230/LIPIcs.CPM.2023.6},
  annote =	{Keywords: String matching, edit distance, Levenshtein, parameterized matching, parameterized words, parameter words, instantiable words, NP-completeness, MAX-SAT}
}

Keywords: String matching, edit distance, Levenshtein, parameterized matching, parameterized words, parameter words, instantiable words, NP-completeness, MAX-SAT
Collection: 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Issue Date: 2023
Date of publication: 21.06.2023
Supplementary Material: Software (Source Code): https://github.com/AaronFive/paramatch archived at: https://archive.softwareheritage.org/swh:1:dir:3a72b0d85a4a2be9126900473b8f3e6d03c12a52


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI