License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagSemProc.06491.16
URN: urn:nbn:de:0030-drops-10478
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2007/1047/
Go to the corresponding Portal |
Erjavec, Tomaž
TEI and Microsoft: a marriage made in...
Abstract
In several on-going projects we were faced with the dilemma of how to reconcile our goal of delivering standardly encoded historical documents, yet have the actual editing and annotation performed by researchers and students who had no knowledge of XML and TEI, and, for the most part, no interest in learning them. The solution we developed consists of allowing the annotators use familiar and flexible editors, such as Microsoft Word (for structural annotation of documents) and Excel (for word-level linguistic annotation) and automatically converting these into TEI. Given the unconstrained nature of such editors this sounds like a recipe for disaster. But the solution crucially depends on a dedicated Web service, to which the annotators can up-load their files; these are then immediately converted to XML/TEI and from it back to a visual format, either HTML or Excel XML, and presented to the annotators. These then get immediate feedback about the quality of their encoding in the source, and can thus correct errors before they accumulate; and the responsibility for the correct encoding rests with the annotators, rather than with the developers of the conversion procedure. The paper describes the web service and details its use in three projects. The main conclusions are that the proposed solution is appropriate for shallow encodings, and nevertheless does require producing detailed annotation guidelines.
BibTeX - Entry
@InProceedings{erjavec:DagSemProc.06491.16,
author = {Erjavec, Toma\v{z}},
title = {{TEI and Microsoft: a marriage made in...}},
booktitle = {Digital Historical Corpora- Architecture, Annotation, and Retrieval},
pages = {1--19},
series = {Dagstuhl Seminar Proceedings (DagSemProc)},
ISSN = {1862-4405},
year = {2007},
volume = {6491},
editor = {Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2007/1047},
URN = {urn:nbn:de:0030-drops-10478},
doi = {10.4230/DagSemProc.06491.16},
annote = {Keywords: Text encoding, manual annotation, open standards, XML, Microsoft}
}
Keywords: |
|
Text encoding, manual annotation, open standards, XML, Microsoft |
Collection: |
|
06491 - Digital Historical Corpora- Architecture, Annotation, and Retrieval |
Issue Date: |
|
2007 |
Date of publication: |
|
13.06.2007 |