License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2020.7
URN: urn:nbn:de:0030-drops-121324
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12132/
Bernardini, Giulia ;
Chen, Huiping ;
Loukides, Grigorios ;
Pisanti, Nadia ;
Pissis, Solon P. ;
Stougie, Leen ;
Sweering, Michelle
String Sanitization Under Edit Distance
Abstract
Let W be a string of length n over an alphabet Σ, k be a positive integer, and ? be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of ? occurs in X_{ED}; (ii) the order of all other length-k substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and ? represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in ?(kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in ?(n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Künnemann, FOCS 2015], to ETFS.
BibTeX - Entry
@InProceedings{bernardini_et_al:LIPIcs:2020:12132,
author = {Giulia Bernardini and Huiping Chen and Grigorios Loukides and Nadia Pisanti and Solon P. Pissis and Leen Stougie and Michelle Sweering},
title = {{String Sanitization Under Edit Distance}},
booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
pages = {7:1--7:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-149-8},
ISSN = {1868-8969},
year = {2020},
volume = {161},
editor = {Inge Li G{\o}rtz and Oren Weimann},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12132},
URN = {urn:nbn:de:0030-drops-121324},
doi = {10.4230/LIPIcs.CPM.2020.7},
annote = {Keywords: String algorithms, data sanitization, edit distance, dynamic programming, conditional lower bound}
}
Keywords: |
|
String algorithms, data sanitization, edit distance, dynamic programming, conditional lower bound |
Collection: |
|
31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020) |
Issue Date: |
|
2020 |
Date of publication: |
|
09.06.2020 |