4 research outputs found
Managing the consistency of distributed documents
Many businesses produce documents as part of their daily activities: software engineers
produce requirements specifications, design models, source code, build scripts and more;
business analysts produce glossaries, use cases, organisation charts, and domain ontology
models; service providers and retailers produce catalogues, customer data, purchase orders,
invoices and web pages.
What these examples have in common is that the content of documents is often semantically
related: source code should be consistent with the design model, a domain ontology
may refer to employees in an organisation chart, and invoices to customers should be consistent
with stored customer data and purchase orders. As businesses grow and documents
are added, it becomes difficult to manually track and check the increasingly complex relationships
between documents. The problem is compounded by current trends towards
distributed working, either over the Internet or over a global corporate network in large
organisations. This adds complexity as related information is not only scattered over
a number of documents, but the documents themselves are distributed across multiple
physical locations.
This thesis addresses the problem of managing the consistency of distributed and possibly
heterogeneous documents. āDocumentsā is used here as an abstract term, and does not
necessarily refer to a human readable textual representation. We use the word to stand
for a file or data source holding structured information, like a database table, or some
source of semi-structured information, like a file of comma-separated values or a document
represented in a hypertext markup language like XML [Bray et al., 2000]. Document
heterogeneity comes into play when data with similar semantics is represented in different
ways: for example, a design model may store a class as a rectangle in a diagram whereas
a source code file will embed it as a textual string; and an invoice may contain an invoice
identifier that is composed of a customer name and date, both of which may be recorded
and managed separately.
Consistency management in this setting encompasses a number of steps. Firstly, checks
must be executed in order to determine the consistency status of documents. Documents
are inconsistent if their internal elements hold values that do not meet the properties
expected in the application domain or if there are conflicts between the values of elements
in multiple documents. The results of a consistency check have to be accumulated and
reported back to the user. And finally, the user may choose to change the documents to
bring them into a consistent state.
The current generation of tools and techniques is not always sufficiently equipped to deal
with this problem. Consistency checking is mostly tightly integrated or hardcoded into tools, leading to problems with extensibility with respect to new types of documents.
Many tools do not support checks of distributed data, insisting instead on accumulating
everything in a centralized repository. This may not always be possible, due to organisational
or time constraints, and can represent excessive overhead if the only purpose of
integration is to improve data consistency rather than deriving any additional benefit.
This thesis investigates the theoretical background and practical support necessary to
support consistency management of distributed documents. It makes a number of contributions
to the state of the art, and the overall approach is validated in significant case
studies that provide evidence of its practicality and usefulness
Consistency Checking of Financial Derivatives Transactions
Financial institutions are increasingly using XML as a de-facto standard to represent and exchange information about their products and services. Their aim is to process transactions quickly, cost-effectively, and with minimal human intervention