1 research outputs found
Analyzing Handwritten and Transcribed Symbols in Disparate Corpora
Cuneiform tablets appertain to the oldest textual artifacts used for more than
three millennia and are comparable in amount and relevance
to texts written in Latin or ancient Greek.
These tablets are typically found in the Middle East and were
written by imprinting wedge-shaped impressions into wet clay.
Motivated by the increased demand for computerized analysis of documents within
the Digital Humanities, we develop the foundation for quantitative processing
of cuneiform script.
Using a 3D-Scanner to acquire a cuneiform tablet or manually creating line
tracings are two completely different representations of the same type of text
source. Each representation is typically processed with its own tool-set and
the textual analysis is therefore limited to a certain type of digital
representation. To homogenize these data source a unifying minimal wedge
feature description is introduced. It is extracted by
pattern matching and subsequent conflict resolution
as cuneiform is written densely with highly overlapping wedges.
Similarity metrics for cuneiform signs based on distinct
assumptions are presented. (i) An implicit model represents cuneiform signs
using undirected mathematical graphs and measures the similarity of
signs with graph kernels.
(ii) An explicit model approaches the problem of recognition by an optimal
assignment between the wedge configurations of two signs.
Further, methods for spotting cuneiform script are developed, combining
the feature descriptors for cuneiform wedges with prior work on
segmentation-free word spotting using part-structured models.
The ink-ball model is adapted by treating wedge feature descriptors as
individual parts.
The similarity metrics and the adapted spotting model are both evaluated
on a real-world dataset outperforming the state-of-the-art in
cuneiform sign similarity and spotting.
To prove the applicability of these methods for computational cuneiform
analysis, a novel approach is presented for mining frequent
constellations of wedges resulting in spatial n-grams. Furthermore,
a method for automatized transliteration of tablets is evaluated by
employing structured and sequential learning on a dataset of
parallel sentences. Finally, the conclusion
outlines how the presented methods enable the development of new tools
and computational analyses, which are objective and reproducible,
for quantitative processing of cuneiform script