License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ECOOP.2023.27
URN: urn:nbn:de:0030-drops-182203
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18220/
Go to the corresponding LIPIcs Volume Portal


Shahrokhi, Hesam ; Shaikhha, Amir

An Efficient Vectorized Hash Table for Batch Computations

pdf-format:
LIPIcs-ECOOP-2023-27.pdf (1 MB)


Abstract

In recent years, the increasing demand for high-performance analytics on big data has led the research on batch hash tables. It is shown that this type of hash table can benefit from the cache locality and multi-threading more than ordinary hash tables. Moreover, the batch design for hash tables is amenable to using advanced features of modern processors such as prefetching and SIMD vectorization. While state-of-the-art research and open-source projects on batch hash tables made efforts to propose improved designs by better usage of mentioned hardware features, their approaches still do not fully exploit the existing opportunities for performance improvements. Furthermore, there is a gap for a high-level batch API of such hash tables for wider adoption of these high-performance data structures. In this paper, we present Vec-HT, a parallel, SIMD-vectorized, and prefetching-enabled hash table for fast batch processing. To allow developers to fully take advantage of its performance, we recommend a high-level batch API design. Our experimental results show the superiority and competitiveness of this approach in comparison with the alternative implementations and state-of-the-art for the data-intensive workloads of relational join processing, set operations, and sparse vector processing.

BibTeX - Entry

@InProceedings{shahrokhi_et_al:LIPIcs.ECOOP.2023.27,
  author =	{Shahrokhi, Hesam and Shaikhha, Amir},
  title =	{{An Efficient Vectorized Hash Table for Batch Computations}},
  booktitle =	{37th European Conference on Object-Oriented Programming (ECOOP 2023)},
  pages =	{27:1--27:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-281-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{263},
  editor =	{Ali, Karim and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18220},
  URN =		{urn:nbn:de:0030-drops-182203},
  doi =		{10.4230/LIPIcs.ECOOP.2023.27},
  annote =	{Keywords: Hash tables, Vectorization, Parallelization, Prefetching}
}

Keywords: Hash tables, Vectorization, Parallelization, Prefetching
Collection: 37th European Conference on Object-Oriented Programming (ECOOP 2023)
Issue Date: 2023
Date of publication: 11.07.2023


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI