6,186 research outputs found
Managing the consistency of distributed documents
Many businesses produce documents as part of their daily activities: software engineers
produce requirements specifications, design models, source code, build scripts and more;
business analysts produce glossaries, use cases, organisation charts, and domain ontology
models; service providers and retailers produce catalogues, customer data, purchase orders,
invoices and web pages.
What these examples have in common is that the content of documents is often semantically
related: source code should be consistent with the design model, a domain ontology
may refer to employees in an organisation chart, and invoices to customers should be consistent
with stored customer data and purchase orders. As businesses grow and documents
are added, it becomes difficult to manually track and check the increasingly complex relationships
between documents. The problem is compounded by current trends towards
distributed working, either over the Internet or over a global corporate network in large
organisations. This adds complexity as related information is not only scattered over
a number of documents, but the documents themselves are distributed across multiple
physical locations.
This thesis addresses the problem of managing the consistency of distributed and possibly
heterogeneous documents. āDocumentsā is used here as an abstract term, and does not
necessarily refer to a human readable textual representation. We use the word to stand
for a file or data source holding structured information, like a database table, or some
source of semi-structured information, like a file of comma-separated values or a document
represented in a hypertext markup language like XML [Bray et al., 2000]. Document
heterogeneity comes into play when data with similar semantics is represented in different
ways: for example, a design model may store a class as a rectangle in a diagram whereas
a source code file will embed it as a textual string; and an invoice may contain an invoice
identifier that is composed of a customer name and date, both of which may be recorded
and managed separately.
Consistency management in this setting encompasses a number of steps. Firstly, checks
must be executed in order to determine the consistency status of documents. Documents
are inconsistent if their internal elements hold values that do not meet the properties
expected in the application domain or if there are conflicts between the values of elements
in multiple documents. The results of a consistency check have to be accumulated and
reported back to the user. And finally, the user may choose to change the documents to
bring them into a consistent state.
The current generation of tools and techniques is not always sufficiently equipped to deal
with this problem. Consistency checking is mostly tightly integrated or hardcoded into tools, leading to problems with extensibility with respect to new types of documents.
Many tools do not support checks of distributed data, insisting instead on accumulating
everything in a centralized repository. This may not always be possible, due to organisational
or time constraints, and can represent excessive overhead if the only purpose of
integration is to improve data consistency rather than deriving any additional benefit.
This thesis investigates the theoretical background and practical support necessary to
support consistency management of distributed documents. It makes a number of contributions
to the state of the art, and the overall approach is validated in significant case
studies that provide evidence of its practicality and usefulness
Assessing and refining mappings to RDF to improve dataset quality
RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely- applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-) structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
XRound : A reversible template language and its application in model-based security analysis
Successful analysis of the models used in Model-Driven Development requires the ability to synthesise the results of analysis and automatically integrate these results with the models themselves. This paper presents a reversible template language called XRound which supports round-trip transformations between models and the logic used to encode system properties. A template processor that supports the language is described, and the use of the template language is illustrated by its application in an analysis workbench, designed to support analysis of security properties of UML and MOF-based models. As a result of using reversible templates, it is possible to seamlessly and automatically integrate the results of a security analysis with a model. (C) 2008 Elsevier B.V. All rights reserved
A research roadmap towards achieving scalability in model driven engineering
International audienceAs Model-Driven Engineering (MDE) is increasingly applied to larger and more complex systems, the current generation of modelling and model management technologies are being pushed to their limits in terms of capacity and eciency. Additional research and development is imperative in order to enable MDE to remain relevant with industrial practice and to continue delivering its widely recognised productivity , quality, and maintainability benefits. Achieving scalabil-ity in modelling and MDE involves being able to construct large models and domain-specific languages in a systematic manner, enabling teams of modellers to construct and refine large models in a collaborative manner, advancing the state of the art in model querying and transformations tools so that they can cope with large models (of the scale of millions of model elements), and providing an infrastructure for ecient storage, indexing and retrieval of large models. This paper attempts to provide a research roadmap for these aspects of scalability in MDE and outline directions for work in this emerging research area
Improving interoperability of AEC collaborative software through the creation of data exchange standards
Today collaborative systems are increasingly being used to manage project information
on large and medium sized construction projects. The speed of expansion in use of these
systems combined with the lack of consolidation has led to a highly fragmented
marketplace for collaborative products. Organisations participating in the construction
lifecycle are currently free to select a collaborative system from any of the available
providers, but once selected were unable to effectively change service provider until the
conclusion of the project. This perceived lock-in along with concerns over the stability
of some technology providers has created unease amongst the user community and is
hindering the adoption of collaborative tools.
Since 2003 the bulk of major UK construction project collaborative software providers
have been working together to develop standards that will allow for project data to be
transferred between vendor applications. Under the umbrella of the Network of
Construction Collaboration Technology Providers (NCCTP), a number of solutions
have been designed allowing for project data to be transferred between heterogeneous
collaborative systems.
Through extensive industry participation, this thesis shows how the theoretical work
done in creating representations of collaborative systems can be applied to real world
system to allow for data to be transfer in bulk, incrementally or in real time. The
findings of work are presented in four peer reviewed papers, three technical reports and a
number of supporting documents which comprise the developed data exchange standards. Work in this field is continuing to evolve with the suppliers of collaborative
systems seeking to implement additional integration
- ā¦