License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2019.18
URN: urn:nbn:de:0030-drops-108852
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10885/
Go to the corresponding OASIcs Volume Portal


Ferreira, João ; Gonçalo Oliveira, Hugo ; Rodrigues, Ricardo

Improving NLTK for Processing Portuguese

pdf-format:
OASIcs-SLATE-2019-18.pdf (0.4 MB)


Abstract

Python has a growing community of users, especially in the AI and ML fields. Yet, Computational Processing of Portuguese in this programming language is limited, in both available tools and results. This paper describes NLPyPort, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese. It is mostly assembled from pre-existent resources or their adaptations, but improves over the performance of existing alternatives in Python, namely in the tasks of tokenization, PoS tagging, lemmatization and NER.

BibTeX - Entry

@InProceedings{ferreira_et_al:OASIcs:2019:10885,
  author =	{Jo{\~a}o Ferreira and Hugo Gon{\c{c}}alo Oliveira and Ricardo Rodrigues},
  title =	{{Improving NLTK for Processing Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{18:1--18:9},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Ricardo Rodrigues and Jan Janousek and Lu{\'\i}s Ferreira and Lu{\'\i}sa Coheur and Fernando Batista and Hugo Gon{\c{c}}alo Oliveira},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10885},
  URN =		{urn:nbn:de:0030-drops-108852},
  doi =		{10.4230/OASIcs.SLATE.2019.18},
  annote =	{Keywords: NLP, Tokenization, PoS tagging, Lemmatization, Named Entity Recognition}
}

Keywords: NLP, Tokenization, PoS tagging, Lemmatization, Named Entity Recognition
Collection: 8th Symposium on Languages, Applications and Technologies (SLATE 2019)
Issue Date: 2019
Date of publication: 24.07.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI