License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2018.41
URN: urn:nbn:de:0030-drops-95042
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/9504/
Goswami, Mayank ;
Medjedovic, Dzejla ;
Mekic, Emina ;
Pandey, Prashant
Buffered Count-Min Sketch on SSD: Theory and Experiments
Abstract
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applications in databases, networking, computational biology and other domains. Many applications that use the count-min sketch process massive and rapidly evolving data sets. For data-intensive applications that aim to keep the overestimate error low, the count-min sketch becomes too large to store in available RAM and may have to migrate to external storage (e.g., SSD.) Due to the random-read/write nature of hash operations of the count-min sketch, simply placing it on SSD stifles the performance of time-critical applications, requiring about 4-6 random reads/writes to SSD per estimate (lookup) and update (insert) operation.
In this paper, we expand on the preliminary idea of the buffered count-min sketch (BCMS) {[Eydi et al., 2017]}, an SSD variant of the count-min sketch, that uses hash localization to scale efficiently out of RAM while keeping the total error bounded. We describe the design and implementation of the buffered count-min sketch, and empirically show that our implementation achieves 3.7 x-4.7 x speedup on update and 4.3 x speedup on estimate operations compared to the traditional count-min sketch on SSD.
Our design also offers an asymptotic improvement in the external-memory model over the original data structure: r random I/Os are reduced to 1 I/O for the estimate operation. For a data structure that uses k blocks on SSD, w as the word/counter size, r as the number of rows, M as the number of bits in the main memory, our data structure uses kwr/M amortized I/Os for updates, or, if kwr/M > 1, 1 I/O in the worst case. In typical scenarios, kwr/M is much smaller than 1. This is in contrast to O(r) I/Os incurred for each update in the original data structure.
Lastly, we mathematically show that for the buffered count-min sketch, the error rate does not substantially degrade over the traditional count-min sketch. Specifically, we prove that for any query q, our data structure provides the guarantee: Pr[Error(q) >= n epsilon (1+o(1))] <= delta + o(1), which, up to o(1) terms, is the same guarantee as that of a traditional count-min sketch.
BibTeX - Entry
@InProceedings{goswami_et_al:LIPIcs:2018:9504,
author = {Mayank Goswami and Dzejla Medjedovic and Emina Mekic and Prashant Pandey},
title = {{Buffered Count-Min Sketch on SSD: Theory and Experiments}},
booktitle = {26th Annual European Symposium on Algorithms (ESA 2018)},
pages = {41:1--41:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-081-1},
ISSN = {1868-8969},
year = {2018},
volume = {112},
editor = {Yossi Azar and Hannah Bast and Grzegorz Herman},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2018/9504},
URN = {urn:nbn:de:0030-drops-95042},
doi = {10.4230/LIPIcs.ESA.2018.41},
annote = {Keywords: Streaming model, Count-min sketch, Counting, Frequency, External memory, I/O efficiency, Bloom filter, Counting filter, Quotient filter}
}
Keywords: |
|
Streaming model, Count-min sketch, Counting, Frequency, External memory, I/O efficiency, Bloom filter, Counting filter, Quotient filter |
Collection: |
|
26th Annual European Symposium on Algorithms (ESA 2018) |
Issue Date: |
|
2018 |
Date of publication: |
|
14.08.2018 |