License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.08131.4
URN: urn:nbn:de:0030-drops-15220
Go to the corresponding Portal

Su, Jian ; Yang, Xiaofeng ; Hong, Huaqing ; Tateisi, Yuka ; Tsujii, Jun'ichi

Coreference Resolution in Biomedical Texts: a Machine Learning Approach

08131.SuJian.ExtAbstrac.1522.pdf (0.02 MB)


Motivation: Coreference resolution, the process of identifying different
mentions of an entity, is a very important component in a
text-mining system. Compared with the work in news articles, the
existing study of coreference resolution in biomedical texts is quite
preliminary by only focusing on specific types of anaphors like pronouns
or definite noun phrases, using heuristic methods, and running
on small data sets. Therefore, there is a need for an in-depth
exploration of this task in the biomedical domain.
Results: In this article, we presented a learning-based approach
to coreference resolution in the biomedical domain. We made three
contributions in our study. Firstly, we annotated a large scale coreference
corpus, MedCo, which consists of 1,999 medline abstracts
in the GENIA data set. Secondly, we proposed a detailed framework
for the coreference resolution task, in which we augmented the traditional
learning model by incorporating non-anaphors into training.
Lastly, we explored various sources of knowledge for coreference
resolution, particularly, those that can deal with the complexity of
biomedical texts. The evaluation on the MedCo corpus showed promising
results. Our coreference resolution system achieved a high
precision of 85.2% with a reasonable recall of 65.3%, obtaining an
F-measure of 73.9%. The results also suggested that our augmented
learning model significantly boosted precision (up to 24.0%) without
much loss in recall (less than 5%), and brought a gain of over 8% in

BibTeX - Entry

  author =	{Su, Jian and Yang, Xiaofeng and Hong, Huaqing and Tateisi, Yuka and Tsujii, Jun'ichi},
  title =	{{Coreference Resolution in Biomedical Texts: a Machine Learning Approach}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-15220},
  doi =		{10.4230/DagSemProc.08131.4},
  annote =	{Keywords: Coreference resolution, biomedical text}

Keywords: Coreference resolution, biomedical text
Collection: 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Issue Date: 2008
Date of publication: 03.06.2008

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI