Pattern-based segmentation of digital documents: model and implementation

di Iorio, Angelo <1977>

thesis

Pattern-based segmentation of digital documents: model and implementation

Authors: Angelo <1977> di Iorio
Publication date: 16 April 2007
Publisher: Alma Mater Studiorum - Università di Bologna
Doi

Abstract

This thesis proposes a new document model, according to which any document can be segmented in some independent components and transformed in a pattern-based projection, that only uses a very small set of objects and composition rules. The point is that such a normalized document expresses the same fundamental information of the original one, in a simple, clear and unambiguous way. The central part of my work consists of discussing that model, investigating how a digital document can be segmented, and how a segmented version can be used to implement advanced tools of conversion. I present seven patterns which are versatile enough to capture the most relevant documents’ structures, and whose minimality and rigour make that implementation possible. The abstract model is then instantiated into an actual markup language, called IML. IML is a general and extensible language, which basically adopts an XHTML syntax, able to capture a posteriori the only content of a digital document. It is compared with other languages and proposals, in order to clarify its role and objectives. Finally, I present some systems built upon these ideas. These applications are evaluated in terms of users’ advantages, workflow improvements and impact over the overall quality of the output. In particular, they cover heterogeneous content management processes: from web editing to collaboration (IsaWiki and WikiFactory), from e-learning (IsaLearning) to professional printing (IsaPress)

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

AMS Tesi di Dottorato

oai:amsdottorato.cib.unibo.it:...

Last time updated on 05/07/2013