License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.06051.3
URN: urn:nbn:de:0030-drops-6296
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2006/629/
Go to the corresponding Portal


Cilibrasi, Rudi ; Vitanyi, Paul M.B.

Automatic Meaning Discovery Using Google

pdf-format:
06051.VitanyiPaulB.Paper.629.pdf (0.2 MB)


Abstract

We survey a new area of parameter-free similarity distance measures
useful in data-mining,
pattern recognition, learning and automatic semantics extraction.
Given a family of distances on a set of objects,
a distance is universal up to a certain precision for that family if it
minorizes every distance in the family between every two objects
in the set, up to the stated precision (we do not require the universal
distance to be an element of the family).
We consider similarity distances
for two types of objects: literal objects that as such contain all of their
meaning, like genomes or books, and names for objects.
The latter may have
literal embodyments like the first type, but may also
be abstract like ``red'' or ``christianity.'' For the first type
we consider
a family of computable distance measures
corresponding to parameters expressing similarity according to
particular features
between
pairs of literal objects. For the second type we consider similarity
distances generated by web users corresponding to particular semantic
relations between the (names for) the designated objects.
For both families we give universal similarity
distance measures, incorporating all particular distance measures
in the family. In the first case the universal
distance is based on compression and in the second
case it is based on Google page counts related to search terms.
In both cases experiments on a massive scale give evidence of the
viability of the approaches.

BibTeX - Entry

@InProceedings{cilibrasi_et_al:DagSemProc.06051.3,
  author =	{Cilibrasi, Rudi and Vitanyi, Paul M.B.},
  title =	{{Automatic Meaning Discovery Using Google}},
  booktitle =	{Kolmogorov Complexity and Applications},
  pages =	{1--23},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6051},
  editor =	{Marcus Hutter and Wolfgang Merkle and Paul M.B. Vitanyi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2006/629},
  URN =		{urn:nbn:de:0030-drops-6296},
  doi =		{10.4230/DagSemProc.06051.3},
  annote =	{Keywords: Normalized Compression Distance, Clustering, Clasification, Relative Semantics of Terms, Google, World-Wide-Web, Kolmogorov complexity}
}

Keywords: Normalized Compression Distance, Clustering, Clasification, Relative Semantics of Terms, Google, World-Wide-Web, Kolmogorov complexity
Collection: 06051 - Kolmogorov Complexity and Applications
Issue Date: 2006
Date of publication: 31.07.2006


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI