DROPS - Document

License:

Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.LDK.2019.4
URN: urn:nbn:de:0030-drops-103682
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10368/

Go to the corresponding OASIcs Volume Portal

Gillis-Webber, Frances ; Tittel, Sabine

The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages

pdf-format:

OASIcs-LDK-2019-4.pdf (0.8 MB)

Abstract

In recent years, the modeling of data from linguistic resources with Resource Description Framework (RDF), following the Linked Data paradigm and using the OntoLex-Lemon vocabulary, has become a prevalent method to create datasets for a multilingual web of data. An important aspect of data modeling is the use of language tags to mark lexicons, lexemes, word senses, etc. of a linguistic dataset. However, attempts to model data from lesser-known languages show significant shortcomings with the authoritative list of language codes by ISO 639: for many lesser-known languages spoken by minorities and also for historical stages of languages, language codes, the basis of language tags, are simply not available. This paper discusses these shortcomings based on the examples of three such languages, i.e., two varieties of click languages of Southern Africa together with Old French, and suggests solutions for the issues identified.

BibTeX - Entry

@InProceedings{gilliswebber_et_al:OASIcs:2019:10368,
  author =	{Frances Gillis-Webber and Sabine Tittel},
  title =	{{The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{4:1--4:15},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Maria Eskevich and Gerard de Melo and Christian F{\"a}th and John P. McCrae and Paul Buitelaar and Christian Chiarcos and Bettina Klimek and Milan Dojchinovski},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10368},
  URN =		{urn:nbn:de:0030-drops-103682},
  doi =		{10.4230/OASIcs.LDK.2019.4},
  annote =	{Keywords: language codes, language tags, Resource Description Framework, Linked Data, Linguistic Linked Data, Khoisan languages, click languages, N|uu, ||'Au, }
}

Keywords: language codes, language tags, Resource Description Framework, Linked Data, Linguistic Linked Data, Khoisan languages, click languages, N|uu, ||'Au,

Collection: 2nd Conference on Language, Data and Knowledge (LDK 2019)

Issue Date: 2019

Date of publication: 16.05.2019

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI

Keywords:		language codes, language tags, Resource Description Framework, Linked Data, Linguistic Linked Data, Khoisan languages, click languages, N\|uu, \|\|'Au,
Collection:		2nd Conference on Language, Data and Knowledge (LDK 2019)
Issue Date:		2019
Date of publication:		16.05.2019