License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.LDK.2019.23
URN: urn:nbn:de:0030-drops-103873
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10387/
Go to the corresponding OASIcs Volume Portal


Abromeit, Frank ; Chiarcos, Christian

Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

pdf-format:
OASIcs-LDK-2019-23.pdf (0.5 MB)


Abstract

We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3.

BibTeX - Entry

@InProceedings{abromeit_et_al:OASIcs:2019:10387,
  author =	{Frank Abromeit and Christian Chiarcos},
  title =	{{Automatic Detection of Language and Annotation Model Information in CoNLL Corpora}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{23:1--23:9},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Maria Eskevich and Gerard de Melo and Christian F{\"a}th and John P. McCrae and Paul Buitelaar and Christian Chiarcos and Bettina Klimek and Milan Dojchinovski},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10387},
  URN =		{urn:nbn:de:0030-drops-103873},
  doi =		{10.4230/OASIcs.LDK.2019.23},
  annote =	{Keywords: LLOD, CoNLL, OLiA}
}

Keywords: LLOD, CoNLL, OLiA
Collection: 2nd Conference on Language, Data and Knowledge (LDK 2019)
Issue Date: 2019
Date of publication: 16.05.2019
Supplementary Material: https://annohub.linguistik.de


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI