License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.STACS.2019.24
URN: urn:nbn:de:0030-drops-102639
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10263/
Dietzfelbinger, Martin ;
Walzer, Stefan
Constant-Time Retrieval with O(log m) Extra Bits
Abstract
For a set U (the universe), retrieval is the following problem. Given a finite subset S subseteq U of size m and f : S -> {0,1}^r for a small constant r, build a data structure D_f with the property that for a suitable query algorithm query we have query(D_f,x) = f(x) for all x in S. For x in U setminus S the value query(D_f,x) is arbitrary in {0,1}^r. The number of bits needed for D_f should be (1+epsilon)r m with overhead epsilon = epsilon(m) >= 0 as small as possible, while the query time should be small. Of course, the time for constructing D_f is relevant as well.
We assume fully random hash functions on U with constant evaluation time are available. It is known that with epsilon ~= 0.09 one can achieve linear construction time and constant query time, and with overhead epsilon_k ~= e^{-k} it is possible to have O(k) query time and O(m^{1+alpha}) construction time, for arbitrary alpha>0. Furthermore, a theoretical construction with epsilon =O((log log m)/sqrt{log m}) gives constant query time and linear construction time. Known constructions avoiding all overhead, except for a seed value of size O(log log m), require logarithmic query time.
In this paper, we present a method for treating the retrieval problem with overhead epsilon = O((log m)/m), which corresponds to O(1) extra memory words (O(log m) bits), and an extremely simple, constant-time query operation. The price to pay is a construction time of O(m^2). We employ the usual framework for retrieval data structures, where construction is effected by solving a sparse linear system of equations over the 2-element field F_2 and a query is effected by a dot product calculation. Our main technical contribution is the design and analysis of a new and natural family of sparse random linear systems with m equations and (1+epsilon)m variables, which combines good locality properties with high probability of having full rank.
Paying a larger overhead of epsilon = O((log m)/m^alpha), the construction time can be reduced to O(m^{1+alpha}) for arbitrary constant 0 < alpha < 1. In combination with an adaptation of known techniques for solving sparse linear systems of equations, our approach leads to a highly practical algorithm for retrieval. In a particular benchmark with m = 10^7 we achieve an order-of-magnitude improvement over previous techniques with epsilon = 0.24% instead of the previously best result of epsilon ~= 3%, with better query time and no significant sacrifices in construction time.
BibTeX - Entry
@InProceedings{dietzfelbinger_et_al:LIPIcs:2019:10263,
author = {Martin Dietzfelbinger and Stefan Walzer},
title = {{Constant-Time Retrieval with O(log m) Extra Bits}},
booktitle = {36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019)},
pages = {24:1--24:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-100-9},
ISSN = {1868-8969},
year = {2019},
volume = {126},
editor = {Rolf Niedermeier and Christophe Paul},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/10263},
doi = {10.4230/LIPIcs.STACS.2019.24},
annote = {Keywords: Retrieval, Hashing, Succinct Data Structure, Randomised Data Structure, Structured Gaussian Elimination, Method of Four Russians}
}
Keywords: |
|
Retrieval, Hashing, Succinct Data Structure, Randomised Data Structure, Structured Gaussian Elimination, Method of Four Russians |
Collection: |
|
36th International Symposium on Theoretical Aspects of Computer Science (STACS 2019) |
Issue Date: |
|
2019 |
Date of publication: |
|
12.03.2019 |