License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.LDK.2019.22
URN: urn:nbn:de:0030-drops-103869
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10386/
Go to the corresponding OASIcs Volume Portal


Freire, Nuno ; Isaac, Antoine ; Goosen, Twan ; Broeder, Daan ; Manguinhas, Hugo ; Charles, Valentine

Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case

pdf-format:
OASIcs-LDK-2019-22.pdf (1 MB)


Abstract

Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe's Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana's objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana's newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a "full-text profile" for the Europeana Data Model, which is being applied to Europeana's newspaper corpus.

BibTeX - Entry

@InProceedings{freire_et_al:OASIcs:2019:10386,
  author =	{Nuno Freire and Antoine Isaac and Twan Goosen and Daan Broeder and Hugo Manguinhas and Valentine Charles},
  title =	{{Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{22:1--22:14},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Maria Eskevich and Gerard de Melo and Christian F{\"a}th and John P. McCrae and Paul Buitelaar and Christian Chiarcos and Bettina Klimek and Milan Dojchinovski},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10386},
  URN =		{urn:nbn:de:0030-drops-103869},
  doi =		{10.4230/OASIcs.LDK.2019.22},
  annote =	{Keywords: Metadata, Full-text, Interoperability, Data aggregation, Cultural Heritage, Research Infrastructures}
}

Keywords: Metadata, Full-text, Interoperability, Data aggregation, Cultural Heritage, Research Infrastructures
Collection: 2nd Conference on Language, Data and Knowledge (LDK 2019)
Issue Date: 2019
Date of publication: 16.05.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI