License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ECOOP.2021.8
URN: urn:nbn:de:0030-drops-140513
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/14051/
Hao, Yu ;
Latif, Sufian ;
Zhang, Hailong ;
Bassily, Raef ;
Rountev, Atanas
Differential Privacy for Coverage Analysis of Software Traces
Abstract
This work considers software execution traces, where a trace is a sequence of run-time events. Each user of a software system collects the set of traces covered by her execution of the software, and reports this set to an analysis server. Our goal is to report the local data of each user in a privacy-preserving manner by employing local differential privacy, a powerful theoretical framework for designing privacy-preserving data analysis. A significant advantage of such analysis is that it offers principled "built-in" privacy with clearly-defined and quantifiable privacy protections. In local differential privacy, the data of an individual user is modified using a local randomizer before being sent to the untrusted analysis server. Based on the randomized information from all users, the analysis server computes, for each trace, an estimate of how many users have covered it.
Such analysis requires that the domain of possible traces be defined ahead of time. Unlike in prior related work, here the domain is either infinite or, at best, restricted to many billions of elements. Further, the traces in this domain typically have structure defined by the static properties of the software. To capture these novel aspects, we define the trace domain with the help of context-free grammars. We illustrate this approach with two exemplars: a call chain analysis in which traces are described through a regular language, and an enter/exit trace analysis in which traces are described by a balanced-parentheses context-free language. Randomization over such domains is challenging due to their large size, which makes it impossible to use prior randomization techniques. To solve this problem, we propose to use count sketch, a fixed-size hashing data structure for summarizing frequent items. We develop a version of count sketch for trace analysis and demonstrate its suitability for software execution data. In addition, instead of randomizing separately each contribution to the sketch, we develop a much-faster one-shot randomization of the accumulated sketch data.
One important client of the collected information is the identification of high-frequency ("hot") traces. We develop a novel approach to identify hot traces from the collected randomized sketches. A key insight is that the very large domain of possible traces can be efficiently explored for hot traces by using the frequency estimates of a visited trace and its prefixes and suffixes. Our experimental study of both call chain analysis and enter/exit trace analysis indicates that the frequency estimates, as well as the identification of hot traces, achieve high accuracy and high privacy.
BibTeX - Entry
@InProceedings{hao_et_al:LIPIcs.ECOOP.2021.8,
author = {Hao, Yu and Latif, Sufian and Zhang, Hailong and Bassily, Raef and Rountev, Atanas},
title = {{Differential Privacy for Coverage Analysis of Software Traces}},
booktitle = {35th European Conference on Object-Oriented Programming (ECOOP 2021)},
pages = {8:1--8:25},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-190-0},
ISSN = {1868-8969},
year = {2021},
volume = {194},
editor = {M{\o}ller, Anders and Sridharan, Manu},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/14051},
URN = {urn:nbn:de:0030-drops-140513},
doi = {10.4230/LIPIcs.ECOOP.2021.8},
annote = {Keywords: Trace Profiling, Differential Privacy, Program Analysis}
}
Keywords: |
|
Trace Profiling, Differential Privacy, Program Analysis |
Collection: |
|
35th European Conference on Object-Oriented Programming (ECOOP 2021) |
Issue Date: |
|
2021 |
Date of publication: |
|
06.07.2021 |
Supplementary Material: |
|
Software (ECOOP 2021 Artifact Evaluation approved artifact): https://doi.org/10.4230/DARTS.7.2.7 |