License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.LDK.2019.12
URN: urn:nbn:de:0030-drops-103762
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10376/
Go to the corresponding OASIcs Volume Portal


Inel, Oana ; Aroyo, Lora

Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study

pdf-format:
OASIcs-LDK-2019-12.pdf (0.6 MB)


Abstract

Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%.

BibTeX - Entry

@InProceedings{inel_et_al:OASIcs:2019:10376,
  author =	{Oana Inel and Lora Aroyo},
  title =	{{Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{12:1--12:15},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Maria Eskevich and Gerard de Melo and Christian F{\"a}th and John P. McCrae and Paul Buitelaar and Christian Chiarcos and Bettina Klimek and Milan Dojchinovski},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10376},
  URN =		{urn:nbn:de:0030-drops-103762},
  doi =		{10.4230/OASIcs.LDK.2019.12},
  annote =	{Keywords: Crowdsourcing, Human-in-the-Loop, Event Extraction, Time Extraction}
}

Keywords: Crowdsourcing, Human-in-the-Loop, Event Extraction, Time Extraction
Collection: 2nd Conference on Language, Data and Knowledge (LDK 2019)
Issue Date: 2019
Date of publication: 16.05.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI