License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2021.22
URN: urn:nbn:de:0030-drops-137302
Go to the corresponding LIPIcs Volume Portal

Sundarmurthy, Bruhathi ; Koutris, Paraschos ; Naughton, Jeffrey

Locality-Aware Distribution Schemes

LIPIcs-ICDT-2021-22.pdf (0.9 MB)


One of the bottlenecks in parallel query processing is the cost of shuffling data across nodes in a cluster. Ideally, given a distribution of the data across the nodes and a query, we want to execute the query by performing only local computation and no communication: in this case, the query is called parallel-correct with respect to the data distribution. Previous work studied this problem for Conjunctive Queries in the case where the distribution scheme is oblivious, i.e., the location of each tuple depends only on the tuple and is independent of the instance. In this work, we show that oblivious schemes have a fundamental theoretical limitation, and initiate the formal study of distribution schemes that are locality-aware. In particular, we focus on a class of distribution schemes called co-hash distribution schemes, which are widely used in parallel systems. In co-hash partitioning, some tables are initially hashed, and the remaining tables are co-located so that a join condition is always satisfied. Given a co-hash distribution scheme, we formally study the complexity of deciding various desirable properties, including obliviousness and redundancy. Then, for a given Conjunctive Query and co-hash scheme, we determine the computational complexity of deciding whether the query is parallel-correct. We also explore a stronger notion of correctness, called parallel disjoint correctness, which guarantees that the query result will be disjointly partitioned across nodes, i.e., there is no duplication of results.

BibTeX - Entry

  author =	{Sundarmurthy, Bruhathi and Koutris, Paraschos and Naughton, Jeffrey},
  title =	{{Locality-Aware Distribution Schemes}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{22:1--22:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-137302},
  doi =		{10.4230/LIPIcs.ICDT.2021.22},
  annote =	{Keywords: partitioning, parallel correctness, join queries}

Keywords: partitioning, parallel correctness, join queries
Collection: 24th International Conference on Database Theory (ICDT 2021)
Issue Date: 2021
Date of publication: 11.03.2021

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI