Fernandes, Mariana Gaspar ; Dias, Cátia ; Coheur, Luísa

Distinguishing Different Classes of Utterances - the UC-PT Corpus

Conversational bots are being used in many scenarios and we can find them playing museum guides or providing customer support, for instance. These bots base their answers in specific information related with their domain of expertise, but there is general information, presented in each user request that, when properly identified, could also be useful for the agent to decide what to answer. As an example, if the user is asking a question or uttering a statement, the bot's action in its search for a response will probably differ. In this paper we present three corpora for the Portuguese language - the UC-PT corpus - that can be used to help conversational bots to distinguish: a) questions from non questions, b) yes-no-questions from other types of questions; and c) personal from non-personal questions. With this information, the agent can decide, for instance, not to answer, redirect the question to a persona chatbot or decide to answer it with a simple "yes", "no" or "maybe". In addition, we benchmark the classification process in these corpora. This corpora will be made publicly available.

Keywords: Corpora, Questions, Conversational Agents, Portuguese Language
Collection: 8th Symposium on Languages, Applications and Technologies (SLATE 2019)
Issue Date: 2019
Date of publication: 24.07.2019

