License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2019.14
URN: urn:nbn:de:0030-drops-108817
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10881/
Go to the corresponding OASIcs Volume Portal


Fernandes, Mariana Gaspar ; Dias, Cátia ; Coheur, Luísa

Distinguishing Different Classes of Utterances - the UC-PT Corpus

pdf-format:
OASIcs-SLATE-2019-14.pdf (0.4 MB)


Abstract

Conversational bots are being used in many scenarios and we can find them playing museum guides or providing customer support, for instance. These bots base their answers in specific information related with their domain of expertise, but there is general information, presented in each user request that, when properly identified, could also be useful for the agent to decide what to answer. As an example, if the user is asking a question or uttering a statement, the bot's action in its search for a response will probably differ. In this paper we present three corpora for the Portuguese language - the UC-PT corpus - that can be used to help conversational bots to distinguish: a) questions from non questions, b) yes-no-questions from other types of questions; and c) personal from non-personal questions. With this information, the agent can decide, for instance, not to answer, redirect the question to a persona chatbot or decide to answer it with a simple "yes", "no" or "maybe". In addition, we benchmark the classification process in these corpora. This corpora will be made publicly available.

BibTeX - Entry

@InProceedings{fernandes_et_al:OASIcs:2019:10881,
  author =	{Mariana Gaspar Fernandes and C{\'a}tia Dias and Lu{\'\i}sa Coheur},
  title =	{{Distinguishing Different Classes of Utterances - the UC-PT Corpus}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{14:1--14:8},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Ricardo Rodrigues and Jan Janousek and Lu{\'\i}s Ferreira and Lu{\'\i}sa Coheur and Fernando Batista and Hugo Gon{\c{c}}alo Oliveira},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10881},
  URN =		{urn:nbn:de:0030-drops-108817},
  doi =		{10.4230/OASIcs.SLATE.2019.14},
  annote =	{Keywords: Corpora, Questions, Conversational Agents, Portuguese Language}
}

Keywords: Corpora, Questions, Conversational Agents, Portuguese Language
Collection: 8th Symposium on Languages, Applications and Technologies (SLATE 2019)
Issue Date: 2019
Date of publication: 24.07.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI