License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2014.251
URN: urn:nbn:de:0030-drops-45749
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2014/4574/
Go to the corresponding OASIcs Volume Portal


Simões, Alberto ; Almeida, José João ; Byers, Simon D.

Language Identification: a Neural Network Approach

pdf-format:
22.pdf (2 MB)


Abstract

One of the first tasks when building a Natural Language application is the detection of the used language in order to adapt the system to that language. This task has been addressed several times. Nevertheless most of these attempts were performed a long time ago when the amount of computer data and the computational power were limited. In this article we analyze and explain the use of a neural network for language identification, where features can be extracted automatically, and therefore, easy to adapt to new languages. In our experiments we got some surprises, namely with the two Chinese variants, whose forced us for some language-dependent tweaking of the neural network. At the end, the network had a precision of 95%, only failing for the Portuguese language.

BibTeX - Entry

@InProceedings{simes_et_al:OASIcs:2014:4574,
  author =	{Alberto Sim{\~o}es and Jos{\'e} Jo{\~a}o Almeida and Simon D. Byers},
  title =	{{Language Identification: a Neural Network Approach}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{251--265},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Maria Jo{\~a}o Varanda Pereira and Jos{\'e} Paulo Leal and Alberto Sim{\~o}es},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2014/4574},
  URN =		{urn:nbn:de:0030-drops-45749},
  doi =		{10.4230/OASIcs.SLATE.2014.251},
  annote =	{Keywords: language identification, neural networks, language models, trigrams}
}

Keywords: language identification, neural networks, language models, trigrams
Collection: 3rd Symposium on Languages, Applications and Technologies
Issue Date: 2014
Date of publication: 18.06.2014


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI