1,148 research outputs found

    Information Integration - the process of integration, evolution and versioning

    Get PDF
    At present, many information sources are available wherever you are. Most of the time, the information needed is spread across several of those information sources. Gathering this information is a tedious and time consuming job. Automating this process would assist the user in its task. Integration of the information sources provides a global information source with all information needed present. All of these information sources also change over time. With each change of the information source, the schema of this source can be changed as well. The data contained in the information source, however, cannot be changed every time, due to the huge amount of data that would have to be converted in order to conform to the most recent schema.\ud In this report we describe the current methods to information integration, evolution and versioning. We distinguish between integration of schemas and integration of the actual data. We also show some key issues when integrating XML data sources

    A Comparative Study: Change Detection and Querying Dynamic XML Documents

    Get PDF
    The efficient management of the dynamic XML documents is a complex area of research. The changes and size of the XML documents throughout its lifetime are limitless. Change detection is an important part of version management to identify difference between successive versions of a document. Document content is continuously evolving. Users wanted to be able to query previous versions, query changes in documents, as well as to retrieve a particular document version efficiently. In this paper we provide comprehensive comparative analysis of various control schemes for change detection and querying dynamic XML documents

    Schema Vacuuming in Temporal Databases

    Get PDF
    Temporal databases facilitate the support of historical information by providing functions for indicating the intervals during which a tuple was applicable (along one or more temporal dimensions). Because data are never deleted, only superceded, temporal databases are inherently append-only resulting, over time, in a large historical sequence of database states. Data vacuuming in temporal databases allows for this sequence to be shortened by strategically, and irrevocably, deleting obsolete data. Schema versioning allows users to maintain a history of database schemata without compromising the semantics of the data or the ability to view data through historical schemata. While the techniques required for data vacuuming in temporal databases have been relatively well covered, the associated area of vacuuming schemata has received less attention. This paper discusses this issue and proposes a mechanism that fits well with existing methods for data vacuuming and schema versioning

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    A theory and model for the evolution of software services

    Get PDF
    Software services are subject to constant change and variation. To control service development, a service developer needs to know why a change was made, what are its implications and whether the change is complete. Typically, service clients do not perceive the upgraded service immediately. As a consequence, service-based applications may fail on the service client side due to changes carried out during a provider service upgrade. In order to manage changes in a meaningful and effective manner service clients must therefore be considered when service changes are introduced at the service provider's side. Otherwise such changes will most certainly result in severe application disruption. Eliminating spurious results and inconsistencies that may occur due to uncontrolled changes is therefore a necessary condition for the ability of services to evolve gracefully, ensure service stability, and handle variability in their behavior. Towards this goal, this work presents a model and a theoretical framework for the compatible evolution of services based on well-founded theories and techniques from a number of disparate fields.

    Efficiently synchronizing multidimensional schema data

    Get PDF
    Most existing concepts in data warehousing provide a central database system storing gathered raw data and redundantly computed materialized views. While in current system architectures client tools are sending queries to a central data warehouse system and are only used to graphically present the result, the steady rise in power of personal computers and the expansion of network bandwidth makes it possible to store replicated parts of the data warehouse at the client thus saving network bandwidth and utilizing local computing power. Within such a scenario a - potentially mobile - client does not need to be connected to a central server while performing local analyses. Although this scenario seems attractive, several problems arise by introducing such an architecture: For example schema data could be changed or new fact data could be available. This paper is focusing on the first problem and presents ideas on how changed schema data can be detected and efficiently synchronized between client and server exploiting the special needs and requirements of data warehousing

    Ontologies on the semantic web

    Get PDF
    As an informational technology, the World Wide Web has enjoyed spectacular success. In just ten years it has transformed the way information is produced, stored, and shared in arenas as diverse as shopping, family photo albums, and high-level academic research. The “Semantic Web” was touted by its developers as equally revolutionary but has not yet achieved anything like the Web’s exponential uptake. This 17 000 word survey article explores why this might be so, from a perspective that bridges both philosophy and IT
    corecore