License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2023.3
URN: urn:nbn:de:0030-drops-185177
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18517/
Go to the corresponding OASIcs Volume Portal


Novák, Attila ; Novák, Borbála

A Pseudonymization Prototype for Hungarian

pdf-format:
OASIcs-SLATE-2023-3.pdf (0.4 MB)


Abstract

In this paper, we present a pseudonymization prototype for Hungarian, an agglutinating language with complex morphology, implemented as a web service. The service provides the following functions: entity identification and extraction; automatic generation and selection of replacement candidates; automatic and consistent replacement and reinflection of entities in the final pseudonymized document. The named entity recognition model applied handles names of persons well, and it has decent performance on other entity types as well. However ID-like entities need to be handled separately to achieve proper performance (not handled in the current prototype version). For automatic replacement candidate generation, a simple entity embedding model is used. We discuss the performance and limitations of the prototype in detail.

BibTeX - Entry

@InProceedings{novak_et_al:OASIcs.SLATE.2023.3,
  author =	{Nov\'{a}k, Attila and Nov\'{a}k, Borb\'{a}la},
  title =	{{A Pseudonymization Prototype for Hungarian}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18517},
  URN =		{urn:nbn:de:0030-drops-185177},
  doi =		{10.4230/OASIcs.SLATE.2023.3},
  annote =	{Keywords: named entity recognition, morphological reinflection, pseudonymization, entity embedding model}
}

Keywords: named entity recognition, morphological reinflection, pseudonymization, entity embedding model
Collection: 12th Symposium on Languages, Applications and Technologies (SLATE 2023)
Issue Date: 2023
Date of publication: 15.08.2023


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI