License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2023.20
URN: urn:nbn:de:0030-drops-177636
Go to the corresponding LIPIcs Volume Portal

Grohe, Martin ; Lindner, Peter ; Standke, Christoph

Probabilistic Query Evaluation with Bag Semantics

LIPIcs-ICDT-2023-20.pdf (0.9 MB)


We initiate the study of probabilistic query evaluation under bag semantics where tuples are allowed to be present with duplicates. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012).
Due to potentially unbounded multiplicities, the bag probabilistic databases we discuss are no longer finite objects, which requires a treatment of representation mechanisms. Moreover, the answer to a Boolean query is a probability distribution over non-negative integers, rather than a probability distribution over {true, false}. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.

BibTeX - Entry

  author =	{Grohe, Martin and Lindner, Peter and Standke, Christoph},
  title =	{{Probabilistic Query Evaluation with Bag Semantics}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-177636},
  doi =		{10.4230/LIPIcs.ICDT.2023.20},
  annote =	{Keywords: Probabilistic Query Evaluation, Probabilistic Databases, Bag Semantics}

Keywords: Probabilistic Query Evaluation, Probabilistic Databases, Bag Semantics
Collection: 26th International Conference on Database Theory (ICDT 2023)
Issue Date: 2023
Date of publication: 17.03.2023

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI