License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ECOOP.2023.27
URN: urn:nbn:de:0030-drops-182203
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18220/
Shahrokhi, Hesam ;
Shaikhha, Amir
An Efficient Vectorized Hash Table for Batch Computations
Abstract
In recent years, the increasing demand for high-performance analytics on big data has led the research on batch hash tables. It is shown that this type of hash table can benefit from the cache locality and multi-threading more than ordinary hash tables. Moreover, the batch design for hash tables is amenable to using advanced features of modern processors such as prefetching and SIMD vectorization. While state-of-the-art research and open-source projects on batch hash tables made efforts to propose improved designs by better usage of mentioned hardware features, their approaches still do not fully exploit the existing opportunities for performance improvements. Furthermore, there is a gap for a high-level batch API of such hash tables for wider adoption of these high-performance data structures. In this paper, we present Vec-HT, a parallel, SIMD-vectorized, and prefetching-enabled hash table for fast batch processing. To allow developers to fully take advantage of its performance, we recommend a high-level batch API design. Our experimental results show the superiority and competitiveness of this approach in comparison with the alternative implementations and state-of-the-art for the data-intensive workloads of relational join processing, set operations, and sparse vector processing.
BibTeX - Entry
@InProceedings{shahrokhi_et_al:LIPIcs.ECOOP.2023.27,
author = {Shahrokhi, Hesam and Shaikhha, Amir},
title = {{An Efficient Vectorized Hash Table for Batch Computations}},
booktitle = {37th European Conference on Object-Oriented Programming (ECOOP 2023)},
pages = {27:1--27:27},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-281-5},
ISSN = {1868-8969},
year = {2023},
volume = {263},
editor = {Ali, Karim and Salvaneschi, Guido},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/18220},
URN = {urn:nbn:de:0030-drops-182203},
doi = {10.4230/LIPIcs.ECOOP.2023.27},
annote = {Keywords: Hash tables, Vectorization, Parallelization, Prefetching}
}
Keywords: |
|
Hash tables, Vectorization, Parallelization, Prefetching |
Collection: |
|
37th European Conference on Object-Oriented Programming (ECOOP 2023) |
Issue Date: |
|
2023 |
Date of publication: |
|
11.07.2023 |