84 research outputs found
An Algorithm for Detecting and Correcting XSLT Rules Affected by Schema Updates
Thesis (Master of Science in Informatics)--University of Tsukuba, no. 37776, 2017.3.2
From Relations to XML: Cleaning, Integrating and Securing Data
While relational databases are still the preferred approach for storing data, XML is emerging
as the primary standard for representing and exchanging data. Consequently, it has
been increasingly important to provide a uniform XML interface to various data sourcesā
integration; and critical to protect sensitive and confidential information in XML data ā
access control. Moreover, it is preferable to first detect and repair the inconsistencies in
the data to avoid the propagation of errors to other data processing steps. In response to
these challenges, this thesis presents an integrated framework for cleaning, integrating and
securing data.
The framework contains three parts. First, the data cleaning sub-framework makes
use of a new class of constraints specially designed for improving data quality, referred
to as conditional functional dependencies (CFDs), to detect and remove inconsistencies in
relational data. Both batch and incremental techniques are developed for detecting CFD
violations by SQL efficiently and repairing them based on a cost model. The cleaned relational
data, together with other non-XML data, is then converted to XML format by using
widely deployed XML publishing facilities. Second, the data integration sub-framework
uses a novel formalism, XML integration grammars (XIGs), to integrate multi-source XML
data which is either native or published from traditional databases. XIGs automatically
support conformance to a target DTD, and allow one to build a large, complex integration
via composition of component XIGs. To efficiently materialize the integrated data, algorithms
are developed for merging XML queries in XIGs and for scheduling them. Third, to
protect sensitive information in the integrated XML data, the data security sub-framework
allows users to access the data only through authorized views. User queries posed on these
views need to be rewritten into equivalent queries on the underlying document to avoid the
prohibitive cost of materializing and maintaining large number of views. Two algorithms
are proposed to support virtual XML views: a rewriting algorithm that characterizes the
rewritten queries as a new form of automata and an evaluation algorithm to execute the
automata-represented queries. They allow the security sub-framework to answer queries
on views in linear time.
Using both relational and XML technologies, this framework provides a uniform approach
to clean, integrate and secure data. The algorithms and techniques in the framework
have been implemented and the experimental study verifies their effectiveness and efficiency
SIQXC: Schema Independent Queryable XML Compression for Smartphones
The explosive growth of XML use over the last decade has led to a lot of research on how to best store and access it. This growth has resulted in XML being described as a de facto standard for storage and exchange of data over the web. However, XML has high redundancy because of its self-Āā describing nature making it verbose. The verbose nature of XML poses a storage problem. This has led to much research devoted to XML compression. It has become of more interest since the use of resource constrained devices is also on the rise. These devices are limited in storage space, processing power and also have finite energy. Therefore, these devices cannot cope with storing and processing large XML documents. XML queryable compression methods could be a solution but none of them has a query processor that runs on such devices. Currently, wireless connections are used to alleviate the problem but they have adverse effects on the battery life. They are therefore not a sustainable solution.
This thesis describes an attempt to address this problem by proposing a queryable compressor (SIQXC) with a query processor that runs in a resource constrained environment thereby lowering wireless connection dependency yet alleviating the storage problem. It applies a novel simple 2 tuple integer encoding system, clustering and gzip. SIQXC achieves an average compression ratio of 70% which is higher than most queryable XML compressors and also supports a wide range of XPATH operators making it competitive approach. It was tested through a practical implementation evaluated against the real data that is usually used for XML benchmarking. The evaluation covered the compression ratio, compression time and query evaluation accuracy and response time. SIQXC allows users to some extent locally store and manipulate the otherwise verbose XML on their Smartphones
Reasoning & Querying ā State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
Strategies for the intelligent selection of components
It is becoming common to build applications as component-intensive systems - a mixture of fresh code and existing components. For application developers the selection of components to incorporate is key to overall system quality - so they want the `best\u27. For each selection task, the application developer will de ne requirements for the ideal component and use them to select the most suitable one. While many software selection processes exist there is a lack of repeatable, usable, exible, automated processes with tool support. This investigation has focussed on nding and implementing strategies to enhance the selection of software components. The study was built around four research elements, targeting characterisation, process, strategies and evaluation. A Post-positivist methodology was used with the Spiral Development Model structuring the investigation. Data for the study is generated using a range of qualitative and quantitative methods including a survey approach, a range of case studies and quasiexperiments to focus on the speci c tuning of tools and techniques. Evaluation and review are integral to the SDM: a Goal-Question-Metric (GQM)-based approach was applied to every Spiral
Proceedings of Monterey Workshop 2001 Engineering Automation for Sofware Intensive System Integration
The 2001 Monterey Workshop on Engineering Automation for Software Intensive System Integration was sponsored by the Office of Naval Research, Air Force Office of Scientific Research, Army Research Office and the Defense Advance Research Projects Agency. It is our pleasure to thank the workshop advisory and sponsors for their vision of a principled engineering solution for software and for their many-year tireless effort in supporting a series of workshops to bring everyone together.This workshop is the 8 in a series of International workshops. The workshop was held in Monterey Beach Hotel, Monterey, California during June 18-22, 2001. The general theme of the workshop has been to present and discuss research works that aims at increasing the practical impact of formal methods for software and systems engineering. The particular focus of this workshop was "Engineering Automation for Software Intensive System Integration". Previous workshops have been focused on issues including, "Real-time & Concurrent Systems", "Software Merging and Slicing", "Software Evolution", "Software Architecture", "Requirements Targeting Software" and "Modeling Software System Structures in a fastly moving scenario".Office of Naval ResearchAir Force Office of Scientific Research Army Research OfficeDefense Advanced Research Projects AgencyApproved for public release, distribution unlimite
Ontology Evaluation
Ontology evaluation is the task of measuring the quality of an ontology. It enables us to answer the following main question: How to assess the quality of an ontology for the Web? In this thesis a theoretical framework and several methods breathing life into the framework are presented. The application to the above scenarios is explored, and the theoretical foundations are thoroughly grounded in the practical usage of the emerging Semantic Web
- ā¦