DROPS - Document

License:

Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.LDK.2019.9
URN: urn:nbn:de:0030-drops-103731
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10373/

Go to the corresponding OASIcs Volume Portal

Chiarcos, Christian ; Fäth, Christian

Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar

pdf-format:

OASIcs-LDK-2019-9.pdf (0.7 MB)

Abstract

This paper describes the application of annotation engineering techniques for the construction of a corpus for Role and Reference Grammar (RRG).
RRG is a semantics-oriented formalism for natural language syntax popular in comparative linguistics and linguistic typology, and predominantly applied for the description of non-European languages which are less-resourced in terms of natural language processing. Because of its cross-linguistic applicability and its conjoint treatment of syntax and semantics, RRG also represents a promising framework for research challenges within natural language processing. At the moment, however, these have not been explored as no RRG corpus data is publicly available. While RRG annotations cannot be easily derived from any single treebank in existence, we suggest that they can be reliably inferred from the intersection of syntactic and semantic annotations as represented by, for example, the Universal Dependencies (UD) and PropBank (PB), and we demonstrate this for the English Web Treebank, a 250,000 token corpus of various genres of English internet text. The resulting corpus is a gold corpus for future experiments in natural language processing in the sense that it is built on existing annotations which have been created manually.
A technical challenge in this context is to align UD and PB annotations, to integrate them in a coherent manner, and to distribute and to combine their information on RRG constituent and operator projections. For this purpose, we describe a framework for flexible and scalable annotation engineering based on flexible, unconstrained graph transformations of sentence graphs by means of SPARQL Update.

BibTeX - Entry

@InProceedings{chiarcos_et_al:OASIcs:2019:10373,
  author =	{Christian Chiarcos and Christian F{\"a}th},
  title =	{{Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{9:1--9:11},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Maria Eskevich and Gerard de Melo and Christian F{\"a}th and John P. McCrae and Paul Buitelaar and Christian Chiarcos and Bettina Klimek and Milan Dojchinovski},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10373},
  URN =		{urn:nbn:de:0030-drops-103731},
  doi =		{10.4230/OASIcs.LDK.2019.9},
  annote =	{Keywords: Role and Reference Grammar, NLP, Corpus, Semantic Web, LLOD, Syntax, Semantics}
}

Keywords: Role and Reference Grammar, NLP, Corpus, Semantic Web, LLOD, Syntax, Semantics

Collection: 2nd Conference on Language, Data and Knowledge (LDK 2019)

Issue Date: 2019

Date of publication: 16.05.2019

Supplementary Material: The software described in this paper are available under the Apache 2.0 license from https://github.com/acoli-repo/RRG. This includes build scripts for the data. We aim to provide the data under the same license as the annotations it is derived from (CC-BY-SA), but we are still in the process of copyright clearance for the original text.

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI

Keywords:		Role and Reference Grammar, NLP, Corpus, Semantic Web, LLOD, Syntax, Semantics
Collection:		2nd Conference on Language, Data and Knowledge (LDK 2019)
Issue Date:		2019
Date of publication:		16.05.2019
Supplementary Material:		The software described in this paper are available under the Apache 2.0 license from https://github.com/acoli-repo/RRG. This includes build scripts for the data. We aim to provide the data under the same license as the annotations it is derived from (CC-BY-SA), but we are still in the process of copyright clearance for the original text.