13 research outputs found

    Some Roads to Script Classification: Via Taxonomy and Other Ways

    Get PDF
    In codicology, the features of a script play an important role for dating and localising the manuscript. There are other questions that can be dealt with by examining these features, e.g. questions of intellectual history, influences of literary genres, or influences of organisational aspects of scriptoria on the shape of a script. But especially in the context of manuscript cataloguing the classification of script is of highest importance if other evidence such as a colophon or references like the naming of celebrations for local saints cannot be found. In order to contextualise the features of a script, palaeography has always striven for inference of a taxonomy from visual properties. Like in other disciplines, the community was not successful in achieving one common naming schema but constituted concurring taxonomies. Thus, the question arises what to do with these in times of the need to search huge amounts of manuscript related data in portals? New approaches in standardisation on the one hand, and semantic technologies and methods for image processing on the other hand, offer new possibilities to access to the manuscripts

    GMM-based Handwriting Style Identification System for Historical Documents

    Get PDF
    In this paper, we describe a novel method for handwriting style identification. A handwriting style can be common to one or several writer. It can represent also a handwriting style used in a period of the history or for specific document. Our method is based on Gaussian Mixture Models (GMMs) using different kind of features computed using a combined fixed-length horizontal and vertical sliding window moving over a document page. For each writing style a GMM is built and trained using page images. At the recognition phase, the system returns log-likelihood scores. The GMM model with the highest score is selected. Experiments using page images from historical German document collection demonstrate good performance results. The identification rate of the GMM-based system developed with six historical handwriting style is 100%

    Codices Electronici Ecclesiae Coloniensis

    No full text
    Schaßan Torsten. Codices Electronici Ecclesiae Coloniensis. In: Gazette du livre médiéval, n°40. Printemps 2002. pp. 45-50

    Hugo von Montfort: Das poetische Werk

    No full text
    Hugo von Montfort – The poetic work calls itself a hybrid edition. In fact one would hesitate to call this a digital edition as from the beginning, the main purpose of the website had been to accompany the printed edition. Nevertheless, over time more materials have been assembled and offer multiple ways of access: for textual scholarly work (Lesefassung), for linguistic and palaeographic interests (Augenfassung), for listening. Unfortunately the texts are distributed over several websites and cannot be linked as one would wish. Additionally, the Augenfassung is available only for the main manuscript. Restrictive licensing will not allow to make use of the material other than reading on-line. This is regrettable as the original edition has been award-winning

    Vom Zeichen zur Schrift: Mit Mustererkennung zur automatisierten Schreiberhanderkennung in mittelalterlichen und frühneuzeitlichen Handschriften

    No full text
    For Digital Humanities in medieval studies and early modern studies, the digitization of manuscripts is a central field. Since each manuscript displays its own unique characteristics, the automatic generation of a machine-readable text using Optical Character Recognition (OCR) as applied to digital images leads, in most cases, to error-prone results. However, characteristics of handwriting such as the size of letters and spacing, slope, and so on can be used to identify the scribe or scribes. This paper demonstrates how the analysis of manuscript images can be used to identify the scribe or scribes. An algorithym will support additional paleographic and codicological findings and provide evidence for the verification or falsification of uncertain attributions

    The development of a medieval scribe

    No full text
    Every individual has a set of traits unique for that person. These include biometric identifiers such as DNA, but the same principal applies to the notion of a scribal fingerprint or human stylome. In contrast to the innate nature of a real fingerprint, such features have been acquired over time and, by definition, are therefore subject to change. Knowledge of the (lack of) consistency of such linguistic or palaeographic identifiers over time is essential in constructing unique personal identifiers for scribes. The present article examines the case of one scribe, working as a secretary for the Teutonic Order in Utrecht and as notary public. His corpus of texts, which includes an important author’s copy of the late fifteenth century Jüngere Hochmeisterchronik, covers a period of thirty years. By quantifying spelling preferences, character sizes, letter-forms and the use of abbreviations it is possible to monitor the development of his writing through time. It turns out that spelling preferences and the use of abbreviations show remarkably little consistency over a longer period. Only changing patterns in the use of certain letter-forms can be used to create a more stable timeline in Hendrik van Vianen’s writings. Furthermore, abrupt changes in the patterns have been used to indicate a phased genesis of the manuscript of the Jüngere Hochmeisterchronik

    Schriften des Instituts für Dokumentologie und Editorik -Band 2 Aspects of Application of Neural Recognition to Digital Editions

    No full text
    Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de/ abrufbar. Leicht veränderte Fassung für die digitale Publikation (siehe Vorwort). Slightly modified version to be published digitally (see preface). Aspects of Application of Neural Recognition to Digital Editions Daniele Fusi Abstract Artificial neuronal networks (ANN) are widely used in software systems which require solutions to problems without a traditional algorithmic approach, like in character recognition: ANN learn by example, so that they require a consistent and well-chosen set of samples to be trained to recognize their patterns. The network is taught to react with high activity in some of its output neurons whenever an input sample belonging to a specified class (e.g. a letter shape) is presented, and has the ability to assess the similarity of samples never encountered before by any of these models. Typical OCR applications thus require a significant amount of preprocessing for such samples, like resizing images and removing all the "noise" data, letting the letter contours emerge clearly from the background. Furthermore, usually a huge number of samples is required to effectively train a network to recognize a character against all the others. This may represent an issue for palaeographical applications because of the relatively low quantity and high complexity of digital samples available, and poses even more problems when our aim is detecting subtle differences (e.g. the special shape of a specific letter from a well-defined period and scriptorium). It would be probably wiser for scholars to define some guidelines for extracting from samples the features defined as most relevant according to their purposes, and let the network deal with just a subset of the overwhelming amount of detailed nuances available. ANN are no magic, and it is always the careful judgement of scholars to provide a theoretical foundation for any computer-based tool they might want to use to help them solve their problems: we can easily illustrate this point with samples drawn from any other application of IT to humanities. Just as we can expect no magic in detecting alliterations in a text if we simply feed a system with a collection of letters, we can no more claim that a neural recognition system might be able to perform well with a relatively small sample where each shape is fed as it is, without instructing the system about the features scholars define as relevant. Even before ANN implementations, it is exactly this theoretical background which must be put to the test when planning such systems. Zusammenfassung Künstliche neuronale Netze (Artificial Neural Networks, ANN) sind in solchen Softwaresystemen weit verbreitet, die Probleme wie Zeichenerkennung zu lösen suchen

    Schriften des Instituts für Dokumentologie und Editorik -Band 2 Computer-Aided Palaeography, Present and Future*

    No full text
    Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de/ abrufbar. Leicht veränderte Fassung für die digitale Publikation (siehe Vorwort). Slightly modified version to be published digitally (see preface). Abstract The field of digital palaeography has received increasing attention in recent years, partly because palaeographers often seem subjective in their views and do not or cannot articulate their reasoning, thereby creating a field of authorities whose opinions are closed to debate. One response to this is to make palaeographical arguments more quantitative, although this approach is by no means accepted by the wider humanities community, with some arguing that handwriting is inherently unquantifiable. This paper therefore asks how palaeographical method might be made more objective and therefore more widely accepted by non-palaeographers while still answering critics within the field. Previous suggestions for objective methods before computing are considered first, and some of their shortcomings are discussed. Similar discussion in forensic document analysis is then introduced and is found relevant to palaeography, though with some reservations. New techniques of "digital" palaeography are then introduced; these have proven successful in forensic analysis and are becoming increasingly accepted there, but they have not yet found acceptance in the humanities communities. The reasons why are discussed, and some suggestions are made for how the software might be designed differently to achieve greater acceptance. Finally, a prototype framework is introduced which is designed to provide a common basis for experiments in "digital" palaeography, ideally enabling scholars to exchange quantitative data about scribal hands, exchange processes for generating this data, articulate both the results themselves and the processes used to produce them, and therefore to ground their arguments more firmly and perhaps find greater acceptance

    Schriften des Instituts für Dokumentologie und Editorik -Band 2 Innovations in Analyzing Manuscript Images and Using them in Digital Scholarly Publications

    No full text
    Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de/ abrufbar. Leicht veränderte Fassung für die digitale Publikation (siehe Vorwort). Slightly modified version to be published digitally (see preface). Innovations in Analyzing Manuscript Images and Using them in Digital Scholarly Publications Bernard J. Muir Abstract Evellum began developing software for the digital analysis and presentation of medieval manuscripts nearly fifteen year ago, when there were very few design and delivery options available to programmers. In the early years, it was not apparent how it would be best to deliver such products nor exactly how they would function and be used, and the question of longevity plagued us. Today there is the TEI to help standardize the mark-up of text and to offer a greater guarantee of longevity than was previously possible, and internet browsers are capable of facilitating the delivery of programmes that integrate text, image and video. Two products designed by Evellum are described here, with comments on the pedagogical issues that have helped determine their shape. Zusammenfassung Vor fast fünfzehn Jahren begann Evellum mit der Entwicklung von Software für die digitale Analyse und Präsentation mittelalterlicher Handschriften

    Schriften des Instituts für Dokumentologie und Editorik -Band 2 Representation and Encoding of Heterogeneous Data in a Web Based Research Environment for Manuscript and Textual Studies

    No full text
    Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de/ abrufbar. Leicht veränderte Fassung für die digitale Publikation (siehe Vorwort). Slightly modified version to be published digitally (see preface). It is currently in its three-year initial phase which is being co-funded by the German Research Foundation (DFG) through the "Thematic Information Networks" scheme within the "Scientific Library Services and Information Systems" programme. We introduce the main object types to be handled by our system and describe the overall functionality of the online platform. The paper focuses on the representations of two main object types: manuscripts as textual witnesses and watermarks, with an emphasis on the former. Since the adequate encoding of different layers of structure of a transmitted text is particularly relevant to optimising users' choices of navigating both digital images of the containing manuscripts and trancriptions of the text contained, this topic is discussed in some detail. We introduce the formal data model and the corresponding encoding for the object types discussed. The project encodes textual data in XML, aiming for TEI conformance where possible. Since no accepted XML model exists for the encoding of metadata within a watermark collection, we briefly explain how we chose to model the objects to accomodate the collections the project is making accessible. Zusammenfassung De
    corecore