License: Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported license (CC BY-NC-ND 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2012.267
URN: urn:nbn:de:0030-drops-35285
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2012/3528/
Go to the corresponding OASIcs Volume Portal


Laki, László

Investigating the Possibilities of Using SMT for Text Annotation

pdf-format:
21.pdf (0.5 MB)


Abstract

In this paper I examine the applicability of SMT methodology for part-of-speech disambiguation and lemmatization in Hungarian. After the baseline system was created, different methods and possibilities were used to improve the efficiency of the system. I also applied some methods to decrease the size of the target dictionary and to find a proper solution to handle out-of-vocabulary words. The results show that such a light-weight system performs comparable results to other state-of-the-art systems.

BibTeX - Entry

@InProceedings{laki:OASIcs:2012:3528,
  author =	{L{\'a}szl{\'o} Laki},
  title =	{{Investigating the Possibilities of Using SMT for Text Annotation}},
  booktitle =	{1st Symposium on Languages, Applications and Technologies},
  pages =	{267--283},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-40-8},
  ISSN =	{2190-6807},
  year =	{2012},
  volume =	{21},
  editor =	{Alberto Sim{\~o}es and Ricardo Queir{\'o}s and Daniela da Cruz},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2012/3528},
  URN =		{urn:nbn:de:0030-drops-35285},
  doi =		{10.4230/OASIcs.SLATE.2012.267},
  annote =	{Keywords: SMT, POS-tagging, Lemmatization, Target language set, OOV}
}

Keywords: SMT, POS-tagging, Lemmatization, Target language set, OOV
Collection: 1st Symposium on Languages, Applications and Technologies
Issue Date: 2012
Date of publication: 21.06.2012


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI