License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2016.3
URN: urn:nbn:de:0030-drops-60086
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/6008/
Go to the corresponding OASIcs Volume Portal


Pinto, Alexandre ; Gonçalo Oliveira, Hugo ; Oliveira Alves, Ana

Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text

pdf-format:
OASIcs-SLATE-2016-3.pdf (0.5 MB)


Abstract

Nowadays, there are many toolkits available for performing common natural language processing tasks, which enable the development of more powerful applications without having to start from scratch. In fact, for English, there is no need to develop tools such as tokenizers, part-of-speech (POS) taggers, chunkers or named entity recognizers (NER). The current challenge is to select which one to use, out of the range of available tools. This choice may depend on several aspects, including the kind and source of text, where the level, formal or informal, may influence the performance of such tools. In this paper, we assess a range of natural language processing toolkits with their default configuration, while performing a set of standard tasks (e.g. tokenization, POS tagging, chunking and NER), in popular datasets that cover newspaper and social network text.
The obtained results are analyzed and, while we could not decide on a single toolkit, this exercise was very helpful to narrow our choice.

BibTeX - Entry

@InProceedings{pinto_et_al:OASIcs:2016:6008,
  author =	{Alexandre Pinto and Hugo Gon{\c{c}}alo Oliveira and Ana Oliveira Alves},
  title =	{{Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text}},
  booktitle =	{5th Symposium on Languages, Applications and Technologies (SLATE'16)},
  pages =	{3:1--3:16},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-006-4},
  ISSN =	{2190-6807},
  year =	{2016},
  volume =	{51},
  editor =	{Marjan Mernik and Jos{\'e} Paulo Leal and Hugo Gon{\c{c}}alo Oliveira},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/6008},
  URN =		{urn:nbn:de:0030-drops-60086},
  doi =		{10.4230/OASIcs.SLATE.2016.3},
  annote =	{Keywords: Natural language processing, toolkits, formal text, social media, benchmark}
}

Keywords: Natural language processing, toolkits, formal text, social media, benchmark
Collection: 5th Symposium on Languages, Applications and Technologies (SLATE'16)
Issue Date: 2016
Date of publication: 21.06.2016


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI