License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2014.267
URN: urn:nbn:de:0030-drops-45753
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2014/4575/
Rodrigues, Ricardo ;
Gonçalo Oliveira, Hugo ;
Gomes, Paulo
LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese
Abstract
Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.
BibTeX - Entry
@InProceedings{rodrigues_et_al:OASIcs:2014:4575,
author = {Ricardo Rodrigues and Hugo Gon{\c{c}}alo Oliveira and Paulo Gomes},
title = {{LemPORT: a High-Accuracy Cross-Platform Lemmatizer for Portuguese}},
booktitle = {3rd Symposium on Languages, Applications and Technologies},
pages = {267--274},
series = {OpenAccess Series in Informatics (OASIcs)},
ISBN = {978-3-939897-68-2},
ISSN = {2190-6807},
year = {2014},
volume = {38},
editor = {Maria Jo{\~a}o Varanda Pereira and Jos{\'e} Paulo Leal and Alberto Sim{\~o}es},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2014/4575},
URN = {urn:nbn:de:0030-drops-45753},
doi = {10.4230/OASIcs.SLATE.2014.267},
annote = {Keywords: lemmatization, normalization, rules, lexicon}
}
Keywords: |
|
lemmatization, normalization, rules, lexicon |
Collection: |
|
3rd Symposium on Languages, Applications and Technologies |
Issue Date: |
|
2014 |
Date of publication: |
|
18.06.2014 |