26 research outputs found
The CMV+P Document Model, Linear Version
Digital documents are peculiar in that they are different things at the same time. For example, an HTML document is a series of Unicode codepoints, but also a tree-like structure, as well as a rendered image in a browser window and a series of bits stored on a physical medium. These multiple identities of digital documents not only make it difficult to discuss the evolution of documents (especially digital-born documents) in rigorous scholarly terms, it also creates practical problems for computer-based comparison tools and algorithms.
The CMV+P model addresses this problem providing a sound formalization of what a document is and how its many identities can coexist at the same time. In its linear version, described in this paper, the CMV+P model sees each document as a stack of abstraction levels, each composed of a) an addressable Content, b) a Model according to which the content has been recorded, and c) a set of Variants used for equivalence matching. The bottom of this stack is the Physical level, symbolizing the concrete medium that embodies the digital document. Content is moved across levels using transformation functions, i.e. encoding functions used to serialize (save) the document and decoding functions used to deserialize (read) it.
A practical application of the CMV+P model is its use in comparison tools, algorithms, and methods. With a clear understanding of the internal stratification of formats and models found in digital documents, comparison tools are able to focus on the most meaningful abstraction levels, providing the user with the ability to understand which comparisons are possible between two arbitrary documents
Versioning Cultural Objects : Digital Approaches
This volume approaches an understanding of the term versioning in the broadest sense, discussing ideas about how versions differ across forms of media, including text, image, and sound. Versions of cultural objects are identified, defined, articulated, and analysed through diverse mechanisms in different fields of research. The study of versions allows for the investigation of the creative processes behind the conception of works, a closer inspection of their socio-political contexts, and promotes investigation of their provenance and circulation. Chapters in this volume include discussion of what a “version” means in different fields, case studies implementing digital versioning techniques, conceptual models for representing versions digitally, and computational and management issues for digital projects
CATview
This paper reviews the CATview tool (Pöckelmann 2015), an interactive and configurable visualization widget for synoptic text views
ER4: Gioele Barabucci -- Tool integration in the digital edition.
Hi there, I am Gioele Barabucci, Marie Curie Experienced Researcher 4 in Cologne (or Köln, as they spell it here). I have landed at the Cologne Center for eHumanities after many years at the University of Bologna. My main research interests are comparison algorithms (also known as diff algorithms) and the representation of differences between documents. These two topics are the backbone of the ER4 DiXiT fellowship: helping the creation of digital scholarly editions in collaborative environmen..
Un modello universale di delta
This thesis presents a universal model of documents and deltas. This model formalize what it means to find differences between documents and to shows a single shared formalization that can be used by any algorithm to describe the differences found between any kind of comparable documents.
The main scientific contribution of this thesis is a universal delta model that can be used to represent the changes found by an algorithm. The main part of this model are the formal definition of changes (the pieces of information that records that something has changed), operations (the definitions of the kind of change that happened) and deltas (coherent summaries of what has changed between two documents). The fundamental mechanism tha makes the universal delta model a very expressive tool is the use of encapsulation relations between changes. In the universal delta model, changes are not always simple records of what has changed, they can also be combined into more complex changes that reflects the detection of more meaningful modifications.
In addition to the main entities (i.e., changes, operations and deltas), the model describes and defines also documents and the concept of equivalence between documents. As a corollary to the model, there is also an extensible catalog of possible operations that algorithms can detect, used to create a common library of operations, and an UML serialization of the model, useful as a reference when implementing APIs that deal with deltas.
The universal delta model presented in this thesis acts as the formal groundwork upon which algorithm can be based and libraries can be implemented. It removes the need to recreate a new delta model and terminology whenever a new algorithm is devised. It also alleviates the problems that toolmakers have when adapting their software to new diff algorithms
XDTD as a Simple Validation Language for XML-based Legal Documents
Validation of XML documents is required in order to maintain consistency in large XML document bases, including document bases of legal texts such as acts, judgments, hansards. Current W3C standards for XML validation either do not provide enough precision (DTD) or are too complex to be immediately authored and read by humans (XML Schema). DTD++ has been proposed as an alternative, and relevant legal standards such as Norme In Rete (Italy), Akoma Ntoso (UN for Africa) and CEN Metalex (European CEN standard) are first written in DTD++ and then converted for standard purposes into XML schema and/or DTD. XDTD is a followup of DTD++, and is a shorter and simplified syntax for XML Schema. XDTD combines the power of the XML Schema model with the readability of DTD. The whole set of features of the XML Schema language, including the new ones in the forthcoming 1.1 version of the language, is available in XDTD, while maintaining the same readability and compactness of the original DTD language. In this paper we show how XDTD simplifies the compilation of vocabularies, with attention to legal standards such as the Akoma Ntoso, Norme In Rete and CEN Metalex legal standards
Signs of the times: medieval punctuation, diplomatic encoding and rendition
Digitally managing punctuation in the editions of medieval manuscripts is one of those issues that initially look like minor details, but later reveal themselves as a tangled web of problems spanning from computer science (how to represent punctuation signs?) to philology (what types of signs do exist?) through epistemology (is the processing of punctuation a mere technical transformation or a valuable part of the scholarship?). The aim of this paper is to address the theoretical aspects of these questions and their practical implications, providing a couple of solutions fitting the paradigms and the technologies of the TEI.
This paper describes how we dealt with the encoding and transformation of the punctuation in the Early New High German edition of Marco Polo’s travel account. Technically, we implemented a set of general rules (as XSLT templates) and various exceptions (as descriptive instructions in XML attributes). In addition to this, we discuss the philological foundation of this method and, contextually, we address the topic of the transformation of a single original source into different transcriptions: from a “hyperdiplomatic” edition to an interpretative one, going through a spectrum of intermediate levels of normalisation. We also reflect on the separation between transcription and analysis, as well as on the role of the editor when the edition is the output of a semi-automated process
Supporting Complexity and Conjectures in Cultural Heritage Descriptions
Dataand metadata once hidden in dusty card cabinets ofthousands of galleries, libraries, archives and museums worldwide are nowavailable online in digital formats. An incredible explosion of metadatahas been expanding in the quantity of digitized data, the richness andsophistication of data models, the number of institutions and privatecitizens that contribute and their interconnection.A fundamental issue, however, limits this undeniable success: currentdata models force the expression of a single point of view. For example,the fieldauthoris either set to a value or to another one. Any disagree-ment about the content of a field is resolved before the publication ofthe data and forever lost.Yet, we argue, the expression of different and contrasting points of viewsis a keystone of scholarship, as well as one of the most engaging aspectsfor everyone. Bowdlerized, sterile, conflict-free data records fails to cap-ture the core of important scholarly debates and thus fails to attractthe interest of the general public. The root cause of this issue is techni-cal rather than cultural: current standards for data models (e.g. RDF,OWL) do not simply support the expression of contrasting statements.In this paper we propose both a methodological approach to address thisproblem, and a proof of concept of how this problem could be fully andcleanly overcome with modest extensions to the existing standards. Wename this approach “contexts and conjectures