License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.08131.19
URN: urn:nbn:de:0030-drops-15169
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2008/1516/
Go to the corresponding Portal


Mueller, Hans-Michael ; Rangarajan, Arun ; Teal, Tracy K. ; van Auken, Kimberly ; Chan, Juancarlos ; Sternberg, Paul W.

Textpresso - an Information Retrieval and Extraction System for Biological Literature

pdf-format:
08131.MuellerHansMichael.ExtAbstract.1516.pdf (0.2 MB)


Abstract

We developed an information retrieval and extraction system that processes the full
text of biological papers. The system, called Textpresso, separates text into
sentences, labels words and phrases according to an ontology (an organized lexicon),
and allows queries to be performed on a database of labeled sentences. The current
ontology comprises approximately one hundred categories of terms, such as "gene",
"regulation", "human disease", "brain area" etc., and also contains main Gene
Ontology (GO) categories. Extraction of particular biological facts, such as gene-­gene
interactions, or the curation of GO cellular components, can be accelerated
significantly by ontologies, with Textpresso automatically performing nearly as well as
expert curators to identify sentences. Search engine for four literatures, C. elegans,
Drosophila, Arabidopsis and Neuroscience have been established by us, and thirteen
systems for other literatures have been developed by other groups around the world.
Currently, our four systems contain 112,000 papers with 40 million sentences, all
systems worldwide contain 190,000 papers with approximately 65 million sentences.

BibTeX - Entry

@InProceedings{mueller_et_al:DagSemProc.08131.19,
  author =	{Mueller, Hans-Michael and Rangarajan, Arun and Teal, Tracy K. and van Auken, Kimberly and Chan, Juancarlos and Sternberg, Paul W.},
  title =	{{Textpresso - an Information Retrieval and Extraction System for Biological Literature}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2008/1516},
  URN =		{urn:nbn:de:0030-drops-15169},
  doi =		{10.4230/DagSemProc.08131.19},
  annote =	{Keywords: Information retrieval, literature search engine, information extraction, automated literature curation, semantic search, ontology,}
}

Keywords: Information retrieval, literature search engine, information extraction, automated literature curation, semantic search, ontology,
Collection: 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Issue Date: 2008
Date of publication: 03.06.2008


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI