License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2018.13
URN: urn:nbn:de:0030-drops-86094
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/8609/
Go to the corresponding LIPIcs Volume Portal


Cormode, Graham ; Dark, Jacques

Fast Sketch-based Recovery of Correlation Outliers

pdf-format:
LIPIcs-ICDT-2018-13.pdf (0.6 MB)


Abstract

Many data sources can be interpreted as time-series, and a key problem is to identify which pairs out of a large collection of signals are highly correlated. We expect that there will be few, large, interesting correlations, while most signal pairs do not have any strong correlation. We abstract this as the problem of identifying the highly correlated pairs in a collection of n mostly pairwise uncorrelated random variables, where observations of the variables arrives as a stream. Dimensionality reduction can remove dependence on the number of observations, but further techniques are required to tame the quadratic (in n) cost of a search through all possible pairs.

We develop a new algorithm for rapidly finding large correlations based on sketch techniques with an added twist: we quickly generate sketches of random combinations of signals, and use these in concert with ideas from coding theory to decode the identity of correlated pairs. We prove correctness and compare performance and effectiveness with the best LSH (locality sensitive hashing) based approach.

BibTeX - Entry

@InProceedings{cormode_et_al:LIPIcs:2018:8609,
  author =	{Graham Cormode and Jacques Dark},
  title =	{{Fast Sketch-based Recovery of Correlation Outliers}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Benny Kimelfeld and Yael Amsterdamer},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2018/8609},
  URN =		{urn:nbn:de:0030-drops-86094},
  doi =		{10.4230/LIPIcs.ICDT.2018.13},
  annote =	{Keywords: correlation, sketching, streaming, dimensionality reduction}
}

Keywords: correlation, sketching, streaming, dimensionality reduction
Collection: 21st International Conference on Database Theory (ICDT 2018)
Issue Date: 2018
Date of publication: 05.03.2018


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI