License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.FSTTCS.2016.1
URN: urn:nbn:de:0030-drops-68869
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/6886/
Go to the corresponding LIPIcs Volume Portal


Thorup, Mikkel

Fast and Powerful Hashing Using Tabulation (Invited Talk)

pdf-format:
LIPIcs-FSTTCS-2016-1.pdf (0.3 MB)


Abstract

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees.

Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of c characters and we have precomputed character tables h_1, ... , h_q mapping characters to random hash values. A key x = (x_1 , ..., x_c) is hashed to h_1[x_1] + h_2[x_2] ... + h_c[x_c]. This scheme is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does
provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing.

Next we consider twisted tabulation where one character is "twisted" with some simple operations. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing.

Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size
proportional to that of the space consumed, double tabulation gives fully-random hashing.

While these tabulation schemes are all easy to implement and use, their analysis is not.

BibTeX - Entry

@InProceedings{thorup:LIPIcs:2016:6886,
  author =	{Mikkel Thorup},
  title =	{{Fast and Powerful Hashing Using Tabulation (Invited Talk)}},
  booktitle =	{36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)},
  pages =	{1:1--1:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-027-9},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{65},
  editor =	{Akash Lal and S. Akshay and Saket Saurabh and Sandeep Sen},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/6886},
  URN =		{urn:nbn:de:0030-drops-68869},
  doi =		{10.4230/LIPIcs.FSTTCS.2016.1},
  annote =	{Keywords: Hashing, Randomized Algorithms}
}

Keywords: Hashing, Randomized Algorithms
Collection: 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2016)
Issue Date: 2016
Date of publication: 10.12.2016


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI