License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2023.19
URN: urn:nbn:de:0030-drops-186728
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18672/
Go to the corresponding LIPIcs Volume Portal


Bez, Dominik ; Kurpicz, Florian ; Lehmann, Hans-Peter ; Sanders, Peter

High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

pdf-format:
LIPIcs-ESA-2023-19.pdf (0.9 MB)


Abstract

A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX'20] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way.
We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs.
In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation.

BibTeX - Entry

@InProceedings{bez_et_al:LIPIcs.ESA.2023.19,
  author =	{Bez, Dominik and Kurpicz, Florian and Lehmann, Hans-Peter and Sanders, Peter},
  title =	{{High Performance Construction of RecSplit Based Minimal Perfect Hash Functions}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{19:1--19:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18672},
  URN =		{urn:nbn:de:0030-drops-186728},
  doi =		{10.4230/LIPIcs.ESA.2023.19},
  annote =	{Keywords: compressed data structure, parallel perfect hashing, bit parallelism, GPU, SIMD, parallel computing, vector instructions}
}

Keywords: compressed data structure, parallel perfect hashing, bit parallelism, GPU, SIMD, parallel computing, vector instructions
Collection: 31st Annual European Symposium on Algorithms (ESA 2023)
Issue Date: 2023
Date of publication: 30.08.2023
Supplementary Material: Software (Source Code): https://github.com/ByteHamster/GpuRecSplit archived at: https://archive.softwareheritage.org/swh:1:dir:1245e6eaef109ce4eb9f24080a2e9bdad7baf6d1
Software (Comparison with Competitors): https://github.com/ByteHamster/MPHF-Experiments archived at: https://archive.softwareheritage.org/swh:1:dir:890e76e03dd70e63eb57f3f62e466a4ee825cee4


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI