DROPS - Document

License:

Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2019.18
URN: urn:nbn:de:0030-drops-108852
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10885/

Go to the corresponding OASIcs Volume Portal

Ferreira, João ; Gonçalo Oliveira, Hugo ; Rodrigues, Ricardo

Improving NLTK for Processing Portuguese

pdf-format:

OASIcs-SLATE-2019-18.pdf (0.4 MB)

Abstract

Python has a growing community of users, especially in the AI and ML fields. Yet, Computational Processing of Portuguese in this programming language is limited, in both available tools and results. This paper describes NLPyPort, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese. It is mostly assembled from pre-existent resources or their adaptations, but improves over the performance of existing alternatives in Python, namely in the tasks of tokenization, PoS tagging, lemmatization and NER.

BibTeX - Entry

@InProceedings{ferreira_et_al:OASIcs:2019:10885,
  author =	{Jo{\~a}o Ferreira and Hugo Gon{\c{c}}alo Oliveira and Ricardo Rodrigues},
  title =	{{Improving NLTK for Processing Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{18:1--18:9},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Ricardo Rodrigues and Jan Janousek and Lu{\'\i}s Ferreira and Lu{\'\i}sa Coheur and Fernando Batista and Hugo Gon{\c{c}}alo Oliveira},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10885},
  URN =		{urn:nbn:de:0030-drops-108852},
  doi =		{10.4230/OASIcs.SLATE.2019.18},
  annote =	{Keywords: NLP, Tokenization, PoS tagging, Lemmatization, Named Entity Recognition}
}

Keywords: NLP, Tokenization, PoS tagging, Lemmatization, Named Entity Recognition

Collection: 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Issue Date: 2019

Date of publication: 24.07.2019

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI

Keywords:		NLP, Tokenization, PoS tagging, Lemmatization, Named Entity Recognition
Collection:		8th Symposium on Languages, Applications and Technologies (SLATE 2019)
Issue Date:		2019
Date of publication:		24.07.2019