7,456 research outputs found

    Ensuring Query Compatibility with Evolving XML Schemas

    Get PDF
    During the life cycle of an XML application, both schemas and queries may change from one version to another. Schema evolutions may affect query results and potentially the validity of produced data. Nowadays, a challenge is to assess and accommodate the impact of theses changes in rapidly evolving XML applications. This article proposes a logical framework and tool for verifying forward/backward compatibility issues involving schemas and queries. First, it allows analyzing relations between schemas. Second, it allows XML designers to identify queries that must be reformulated in order to produce the expected results across successive schema versions. Third, it allows examining more precisely the impact of schema changes over queries, therefore facilitating their reformulation

    Type-Based Detection of XML Query-Update Independence

    Get PDF
    This paper presents a novel static analysis technique to detect XML query-update independence, in the presence of a schema. Rather than types, our system infers chains of types. Each chain represents a path that can be traversed on a valid document during query/update evaluation. The resulting independence analysis is precise, although it raises a challenging issue: recursive schemas may lead to infer infinitely many chains. A sound and complete approximation technique ensuring a finite analysis in any case is presented, together with an efficient implementation performing the chain-based analysis in polynomial space and time.Comment: VLDB201

    Libraries and Information Systems Need XML/RDF... but Do They Know It?

    Get PDF
    This article presents an approach to the uses of XML (eXtensible Markup Language) and Semantic Web technologies in the field of information services, focusing mainly on the creation and management of digital libraries compared to traditional libraries, while paying special attention to the concept and application of metadata, and RDF based integration

    Legal issues

    Get PDF

    XML Matchers: approaches and challenges

    Full text link
    Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

    An integrated approach to preparing, publishing, presenting and preserving theses

    Get PDF
    [Abstract]: This paper describes progress on a project funded by the Australian government to create Free software; the Integrated Content Environment for research and scholarship (ICE-RS). ICE-RS is a multi-faceted project which will add value to finished theses by making them available in both HTML and PDF, as well as providing a mechanism for packaging multimedia theses. The project will also concentrate on providing services for thesis production, with version control, automated backup and collaboration services. The paper begins with the established content management system that is the basis for the project, ICE-RS , originally developed to create courseware packages. ICE includes distributed, version controlled collaboration, using word processing software and works on multiple platforms, with standard document formats. We survey other approaches to content authoring and publishing for ETDs. We showcase exploratory work on integration of the thesis writing process with Institutional Repository software including publishing theses in both PDF and HTML with preservation and descriptive metadata. The presentation will include demonstrations of thesis production at all stages of development from proposal to completion. In a more speculative vein, we will discuss opportunities for institutions to provide new levels of support for candidates via automated thesis “dashboard” progress reports, supervisor and examiner annotation and comment and support for copyright considerations as early as possible in the process

    Assessing and refining mappings to RDF to improve dataset quality

    Get PDF
    RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely- applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-) structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases

    Measuring complexity in OGC web services XML schemas: pragmatic use and solutions

    Get PDF
    The use of standards in the geospatial domain, such as those defined by the Open Geospatial Consortium (OGC), for exchanging data has brought a great deal of interoperability upon which systems can be built in a reliable way. Unfortunately, these standards are becoming increasingly complex, making their implementation an arduous task. The use of appropriate software metrics can be very useful to quantify different properties of the standards that ultimately may suggest different solutions to deal with problems related to their complexity. In this regard, we present in this article an attempt to measure the complexity of the schemas associated with the OGC implementation specifications. We use a comprehensive set of metrics to provide a multidimensional view of this complexity. These metrics can be used to evaluate the impact of design decisions, study the evolution of schemas, and so on. We also present and evaluate different solutions that could be applied to overcome some of the problems associated with the complexity of the schemas

    XEM: XML Evolution Management

    Get PDF
    As information on the World Wide Web continues to proliferate at an astounding rate, the Extensible Markup Language (XML) has been emerging as a standard format for data representation on the web. In many application domains, specific document type definitions (DTDs) are designed to enforce a semantically agreed-upon structure of the XML documents. In XML context, these structural definitions serve as schemata. However, both the data and the structure (schema) of XML documents tend to change over time for a multitude of reasons, including to correct design errors in the DTD, to allow expansion of the application scope over time, or to account for the merging of several businesses into one. Most of the current software tools that enable the use of XML do not provide explicit support for such data or schema changes. Using these tools in a changing environment entails making manual edits to DTDs and XML data and reloading them from scratch. In this vein, we put forth the first solution framework, called XML Evolution Manager (XEM), to manage the evolution of DTDs and XML documents. XEM provides a minimal yet complete taxonomy of basic change primitives. These primitives, classified as either data or schema changes, are consistency-preserving. For a data change, they ensure that the modified XML document conforms to its DTD both in structure and constraints. For a schema change, they ensure that the new DTD is well-formed, and all existing XML documents are transformed also to conform to the modified DTD. We prove both the completeness of our evolution taxonomy, as well as its consistency-preserving nature. To verify the feasibility of our XEM approach we have implemented a working prototype system in Java, using the XML4J parser from IBM and PSE Pro as our backend storage system. We present an experimental study run on this system where we compare the relative efficiencies of the primitive operations in terms of their execution times. We then contrast these execution times against the time to reload the data, which would be required in a manual system. Based on the results of these experiments we conclude that our approach improves upon the previous method of making manual changes and reloading data from scratch by providing automated evolution management facilities for DTDs and XML documents
    • 

    corecore