License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.SLATE.2023.8
URN: urn:nbn:de:0030-drops-185220
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18522/
Go to the corresponding OASIcs Volume Portal


Rodrigues dos Santos, Sofia G. ; Dias de Almeida, J. João

OCRticle - a Structure-Aware OCR Application

pdf-format:
OASIcs-SLATE-2023-8.pdf (13 MB)


Abstract

While there are currently many applications and websites capable of performing Optical Character Recognition (OCR), none of the widely available options offer structured OCR, i.e., OCR that maintains the text’s original structure. For example, if a document has a title, after performing OCR on it, the title should have a different formatting, in order to distinguish it from the rest of the text.
This paper covers the topic of structure-aware OCR, first by describing the current state of OCR tools, then by showcasing a prototype tool capable of retaining the structure of articles scanned from an image.

BibTeX - Entry

@InProceedings{rodriguesdossantos_et_al:OASIcs.SLATE.2023.8,
  author =	{Rodrigues dos Santos, Sofia G. and Dias de Almeida, J. Jo\~{a}o},
  title =	{{OCRticle - a Structure-Aware OCR Application}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18522},
  URN =		{urn:nbn:de:0030-drops-185220},
  doi =		{10.4230/OASIcs.SLATE.2023.8},
  annote =	{Keywords: OCR, Optical Character Recognition, Data Structure, Data Parsing, Document Structure}
}

Keywords: OCR, Optical Character Recognition, Data Structure, Data Parsing, Document Structure
Collection: 12th Symposium on Languages, Applications and Technologies (SLATE 2023)
Issue Date: 2023
Date of publication: 15.08.2023
Supplementary Material: Software (Source Code): https://github.com/RisingFisan/OCRticle archived at: https://archive.softwareheritage.org/swh:1:dir:651451c61ae5fca1265a703ed38eab264bb82551


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI