License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2021.8
URN: urn:nbn:de:0030-drops-144257
Go to the corresponding OASIcs Volume Portal

Costa Cunha, Luís Filipe ; Ramalho, José Carlos

NER in Archival Finding Aids

OASIcs-SLATE-2021-8.pdf (1 MB)


At the moment, the vast majority of Portuguese archives with an online presence use a software solution to manage their finding aids: e.g. Digitarq or Archeevo.
Most of these finding aids are written in natural language without any annotation that would enable a machine to identify named entities, geographical locations or even some dates. That would allow the machine to create smart browsing tools on top of those record contents like entity linking and record linking.
In this work we have created a set of datasets to train Machine Learning algorithms to find those named entities and geographical locations. After training several algorithms we tested them in several datasets and registered their precision and accuracy.
These results enabled us to achieve some conclusions about what kind of precision we can achieve with this approach in this context and what to do with the results: do we have enough precision and accuracy to create toponymic and anthroponomic indexes for archival finding aids? Is this approach suitable in this context? These are some of the questions we intend to answer along this paper.

BibTeX - Entry

  author =	{Costa Cunha, Lu{\'\i}s Filipe and Ramalho, Jos\'{e} Carlos},
  title =	{{NER in Archival Finding Aids}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{8:1--8:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-144257},
  doi =		{10.4230/OASIcs.SLATE.2021.8},
  annote =	{Keywords: Named Entity Recognition, Archival Descriptions, Machine Learning, Deep Learning}

Keywords: Named Entity Recognition, Archival Descriptions, Machine Learning, Deep Learning
Collection: 10th Symposium on Languages, Applications and Technologies (SLATE 2021)
Issue Date: 2021
Date of publication: 10.08.2021

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI