License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2020.11
URN: urn:nbn:de:0030-drops-121367
Go to the corresponding LIPIcs Volume Portal

Commins, Patty ; Liben-Nowell, David ; Liu, Tina ; Tomlinson, Kiran

Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions

LIPIcs-CPM-2020-11.pdf (0.6 MB)


Algorithms to find optimal alignments among strings, or to find a parsimonious summary of a collection of strings, are well studied in a variety of contexts, addressing a wide range of interesting applications. In this paper, we consider chain letters, which contain a growing sequence of signatories added as the letter propagates. The unusual constellation of features exhibited by chain letters (one-ended growth, divergence, and mutation) make their propagation, and thus the corresponding reconstruction problem, both distinctive and rich. Here, inspired by these chain letters, we formally define the problem of computing an optimal summary of a set of diverging string sequences. From a collection of these sequences of names, with each sequence noisily corresponding to a branch of the unknown tree T representing the letter’s true dissemination, can we efficiently and accurately reconstruct a tree T' ≈ T? In this paper, we give efficient exact algorithms for this summarization problem when the number of sequences is small; for larger sets of sequences, we prove hardness and provide an efficient heuristic algorithm. We evaluate this heuristic on synthetic data sets chosen to emulate real chain letters, showing that our algorithm is competitive with or better than previous approaches, and that it also comes close to finding the true trees in these synthetic datasets.

BibTeX - Entry

  author =	{Patty Commins and David Liben-Nowell and Tina Liu and Kiran Tomlinson},
  title =	{{Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{11:1--11:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{Inge Li G{\o}rtz and Oren Weimann},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-121367},
  doi =		{10.4230/LIPIcs.CPM.2020.11},
  annote =	{Keywords: edit distance, tree reconstruction, information propagation, chain letters}

Keywords: edit distance, tree reconstruction, information propagation, chain letters
Collection: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Issue Date: 2020
Date of publication: 09.06.2020
Supplementary Material: Related research data and source code hosted at

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI