License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2017.20
URN: urn:nbn:de:0030-drops-79530
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2017/7953/
Go to the corresponding OASIcs Volume Portal


Pereira, José Casimiro ; Teixeira, António J. S. ; Rodrigues, Mário ; Miguel, Pedro ; Pinto, Joaquim Sousa

Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text

pdf-format:
OASIcs-SLATE-2017-20.pdf (0.5 MB)


Abstract

Information Extraction from natural texts has a great potential in areas such as Tourism and can be of great assistance in transforming customers' comments in valuable information for Tourism operators, governments and customers. After extraction, information needs to be efficiently transmitted to end-users in a natural way. Systems should not, in general, send extracted information directly to end-users, such as hotel managers, as it can be difficult to read.

Naturally, humans transmit and encode information using natural languages, such as Portuguese. The problem arising from the need of efficient and natural transmission of the information to end-user is how to encode it. The use of natural language generation (NLG) is a possible solution, for producing sentences, and, with them, texts.

In this paper we address this, with a data-to-text system, a derivation of formal NLG systems that use data as input. The proposed system uses an aligned corpus, which was defined, collected and processed, in about approximately 3 weeks of work. To build the language model were used three different in-domain and out-of-domain corpora. The effects of this approach were evaluated, and results are presented.

Automatic metrics, BLEU and Meteor, were used to evaluate the different systems, comparing their values with similar systems. Results show that expanding the corpus has a major positive effect in BLEU and Meteor scores and use of additional corpora (in-domain and out-of-domain) in training language model does not result in significantly different performance.

The scores obtained, combined with their comparison with other systems performance and informal evaluation by humans of the sentences produced, give additional support for the capabilities of the translation based approach for fast development of data-to-text for new domains.

BibTeX - Entry

@InProceedings{pereira_et_al:OASIcs:2017:7953,
  author =	{Jos{\'e} Casimiro Pereira and Ant{\'o}nio J. S. Teixeira and M{\'a}rio Rodrigues and Pedro Miguel and Joaquim Sousa Pinto},
  title =	{{Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{20:1--20:14},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Ricardo Queir{\'o}s and M{\'a}rio Pinto and Alberto Sim{\~o}es and Jos{\'e} Paulo Leal and Maria Jo{\~a}o Varanda},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/7953},
  URN =		{urn:nbn:de:0030-drops-79530},
  doi =		{10.4230/OASIcs.SLATE.2017.20},
  annote =	{Keywords: Data-to-Text, Natural Language Generation, Automatic Translation, opinions, Tourism, Portuguese}
}

Keywords: Data-to-Text, Natural Language Generation, Automatic Translation, opinions, Tourism, Portuguese
Collection: 6th Symposium on Languages, Applications and Technologies (SLATE 2017)
Issue Date: 2017
Date of publication: 04.10.2017


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI