License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2021.11
URN: urn:nbn:de:0030-drops-144286
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/14428/
Matos, Emanuel ;
Rodrigues, Mário ;
Miguel, Pedro ;
Teixeira, António
Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers
Abstract
Named Entity Recognition (NER) is an essential step for many natural language processing tasks, including Information Extraction. Despite recent advances, particularly using deep learning techniques, the creation of accurate named entity recognizers continues a complex task, highly dependent on annotated data availability. To foster existence of NER systems for new domains it is crucial to obtain the required large volumes of annotated data with low or no manual labor. In this paper it is proposed a system to create the annotated data automatically, by resorting to a set of existing NERs and information sources (DBpedia). The approach was tested with documents of the Tourism domain. Distinct methods were applied for deciding the final named entities and respective tags. The results show that this approach can increase the confidence on annotations and/or augment the number of categories possible to annotate. This paper also presents examples of new NERs that can be rapidly created with the obtained annotated data. The annotated data, combined with the possibility to apply both the ensemble of NER systems and the new Gazetteer-based NERs to large corpora, create the necessary conditions to explore the recent neural deep learning state-of-art approaches to NER (ex: BERT) in domains with scarce or nonexistent data for training.
BibTeX - Entry
@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
author = {Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
title = {{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
booktitle = {10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
pages = {11:1--11:14},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-202-0},
ISSN = {2190-6807},
year = {2021},
volume = {94},
editor = {Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/14428},
URN = {urn:nbn:de:0030-drops-144286},
doi = {10.4230/OASIcs.SLATE.2021.11},
annote = {Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}
Keywords: |
|
Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese |
Collection: |
|
10th Symposium on Languages, Applications and Technologies (SLATE 2021) |
Issue Date: |
|
2021 |
Date of publication: |
|
10.08.2021 |