License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.07181.10
URN: urn:nbn:de:0030-drops-12574
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2007/1257/
Go to the corresponding Portal


Assent, Ira ; Krieger, Ralph ; Müller, Emmanuel ; Seidl, Thomas

Subspace outlier mining in large multimedia databases

pdf-format:
07181.AssentIra.ExtAbstract.1257.pdf (0.2 MB)


Abstract

Increasingly large multimedia databases in life sciences, e-commerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. One of the major tasks in KDD, clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped clusters even in noisy data bases.
In high-dimensional data bases, meaningful clusters can no longer be detected due to the "curse of dimensionality". Consequently, subspace clustering searches for clusters hidden in any subset of the set of dimensions. As the number of subspaces is exponential in the number of dimensions, traditional approaches use fixed pruning thresholds. This results in dimensionality bias, i.e. with growing dimensionality, more clusters are missed.
Clustering information is very useful for applications like fraud detection where outliers, i.e. objects which differ from all clusters, are searched. In subspace clustering, an object may be an outlier with respect to some groups, but not with respect to others, leading to possibly conflicting information.
We propose a density-based unbiased subspace clustering model for outlier detection. We define outliers with respect to all maximal and non-redundant subspace clusters, taking their distance (deviation in attribute values), relevance (number of attributes covered) and support (number of objects covered) into account.
We demonstrate the quality of our subspace clustering results in experiments on real world and synthetic databases and discuss our outlier model.

BibTeX - Entry

@InProceedings{assent_et_al:DagSemProc.07181.10,
  author =	{Assent, Ira and Krieger, Ralph and M\"{u}ller, Emmanuel and Seidl, Thomas},
  title =	{{Subspace outlier mining in large multimedia databases}},
  booktitle =	{Parallel Universes and Local Patterns},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{7181},
  editor =	{Michael R. Berthold and Katharina Morik and Arno Siebes},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2007/1257},
  URN =		{urn:nbn:de:0030-drops-12574},
  doi =		{10.4230/DagSemProc.07181.10},
  annote =	{Keywords: Data mining, outlier detection, subspace clustering, density-based clustering}
}

Keywords: Data mining, outlier detection, subspace clustering, density-based clustering
Collection: 07181 - Parallel Universes and Local Patterns
Issue Date: 2007
Date of publication: 11.12.2007


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI