License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.08131.17
URN: urn:nbn:de:0030-drops-15126
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2008/1512/
Go to the corresponding Portal |
Krauthammer, Michael ;
Luong, Thaibinh
Term Mapping Using Matrix Operations
Abstract
We believe that gene name identification is a modular process involving term recognition, classification and mapping. This work's focus is on gene name mapping, and we assume that names are already recognized and classified. We use a combination of two methods to map recognized entities to their appropriate gene identifiers (Entrez GeneIDs): the Trigram Method, and the Network Method. Both methods require preprocessing, using resources from Entrez Gene, to construct a set of method-specific matrices. We first address lexical variation by transforming gene names into their unique "trigrams" (groups of three alphanumeric characters), and perform trigram matching against the preprocessed gene dictionary. For ambiguous gene names, we additionally perform a contextual analysis of the abstract that contains the recognized entity. We have formalized our method as a sequence of matrix manipulations, allowing for a fast and coherent implementation of the algorithm. In this talk, we also show how gene name identification, and text mining in general, can play a critical role in translational medicine. We demonstrate how term identification is useful for establishing a biobibliometric distance between genes and psychiatric disorders.
BibTeX - Entry
@InProceedings{krauthammer_et_al:DagSemProc.08131.17,
author = {Krauthammer, Michael and Luong, Thaibinh},
title = {{Term Mapping Using Matrix Operations}},
booktitle = {Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
pages = {1--1},
series = {Dagstuhl Seminar Proceedings (DagSemProc)},
ISSN = {1862-4405},
year = {2008},
volume = {8131},
editor = {Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2008/1512},
URN = {urn:nbn:de:0030-drops-15126},
doi = {10.4230/DagSemProc.08131.17},
annote = {Keywords: Term Identification}
}
Keywords: |
|
Term Identification |
Collection: |
|
08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives |
Issue Date: |
|
2008 |
Date of publication: |
|
03.06.2008 |