46,836 research outputs found

    IMPrECISE: Good-is-good-enough data integration

    Get PDF
    IMPrECISE is an XQuery module that adds probabilistic XML functionality to an existing XML DBMS, in our case MonetDB/XQuery. We demonstrate probabilistic XML and data integration functionality of IMPrECISE. The prototype is configurable with domain knowledge such that the amount of uncertainty arising during data integration is reduced to an acceptable level, thus obtaining a "good is good enough" data integration with minimal human effort

    Duplicate Detection in Probabilistic Data

    Get PDF
    Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data

    XML in Motion from Genome to Drug

    Get PDF
    Information technology (IT) has emerged as a central to the solution of contemporary genomics and drug discovery problems. Researchers involved in genomics, proteomics, transcriptional profiling, high throughput structure determination, and in other sub-disciplines of bioinformatics have direct impact on this IT revolution. As the full genome sequences of many species, data from structural genomics, micro-arrays, and proteomics became available, integration of these data to a common platform require sophisticated bioinformatics tools. Organizing these data into knowledgeable databases and developing appropriate software tools for analyzing the same are going to be major challenges. XML (eXtensible Markup Language) forms the backbone of biological data representation and exchange over the internet, enabling researchers to aggregate data from various heterogeneous data resources. The present article covers a comprehensive idea of the integration of XML on particular type of biological databases mainly dealing with sequence-structure-function relationship and its application towards drug discovery. This e-medical science approach should be applied to other scientific domains and the latest trend in semantic web applications is also highlighted

    Towards a novel framework for the assessment of enterprise application integration packages

    Get PDF
    In addressing enterprise integration problems, a diversity of technologies such as CORBA and XML were promoted, yet no single integration technology solves all integration problems. As a result, a new generation of software called Enterprise Application Integration (EAI) is emerging to addresses many integration problems by combining a diversity of integration technologies (e.g. message brokers, adapters, XML). Since EAI is a new research area, there is an absence of literature discussing issues like its adoption, evaluation and implementation. This paper, examines the application of two frameworks for the evaluation of EAI packages in the practical arena. In doing so, the authors use case study strategy to investigate integration issues. Empirical data derived from the case study suggest additions to the two evaluation frameworks. Therefore, the authors revised and extend previous works by proposing a novel evaluation framework for the assessment of EAI packages. The proposed framework makes novel contribution at two levels. First, at the conceptual level, as it incorporates criteria identified separately in previous studies as evaluation criteria. The proposed framework can be used as a decision-making tool and, supports management when taking decisions regarding the adoption of EAI. Additionally, it can be used by researchers to analyse and understand the capabilities o

    XML for Domain Viewpoints

    Get PDF
    Within research institutions like CERN (European Organization for Nuclear Research) there are often disparate databases (different in format, type and structure) that users need to access in a domain-specific manner. Users may want to access a simple unit of information without having to understand detail of the underlying schema or they may want to access the same information from several different sources. It is neither desirable nor feasible to require users to have knowledge of these schemas. Instead it would be advantageous if a user could query these sources using his or her own domain models and abstractions of the data. This paper describes the basis of an XML (eXtended Markup Language) framework that provides this functionality and is currently being developed at CERN. The goal of the first prototype was to explore the possibilities of XML for data integration and model management. It shows how XML can be used to integrate data sources. The framework is not only applicable to CERN data sources but other environments too.Comment: 9 pages, 6 figures, conference report from SCI'2001 Multiconference on Systemics & Informatics, Florid

    Conceptual Workflow for Complex Data Integration using AXML

    No full text
    International audienceRelevant data for decision support systems are available everywhere and in various formats. Such data must be integrated into a unified format. Traditional data integration approaches are not adapted to handle complex data. Thus, we exploit the Active XML language for integrating complex data. Its XML part allows to unify, model and store complex data. Moreover, its services part tackles the distributed issue of data sources. Accordingly, different integration tasks are proposed as services. These services are managed via a set of active rules that are built upon metadata and events of the integration system. In this paper, we design an architecture for integrating complex data autonomously. We have also designed the workflow for data integration tasks
    corecore