298 research outputs found

    K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources

    Get PDF
    The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, on-the- fly integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear winner . Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application

    A abordagem POESIA para a integração de dados e serviços na Web semantica

    Get PDF
    Orientador: Claudia Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: POESIA (Processes for Open-Ended Systems for lnformation Analysis), a abordagem proposta neste trabalho, visa a construção de processos complexos envolvendo integração e anĂĄlise de dados de diversas fontes, particularmente em aplicaçÔes cientĂ­ficas. A abordagem Ă© centrada em dois tipos de mecanismos da Web semĂąntica: workflows cientĂ­ficos, para especificar e compor serviços Web; e ontologias de domĂ­nio, para viabilizar a interoperabilidade e o gerenciamento semĂąnticos dos dados e processos. As principais contribuiçÔes desta tese sĂŁo: (i) um arcabouço teĂłrico para a descrição, localização e composição de dados e serviços na Web, com regras para verificar a consistĂȘncia semĂąntica de composiçÔes desses recursos; (ii) mĂ©todos baseados em ontologias de domĂ­nio para auxiliar a integração de dados e estimar a proveniĂȘncia de dados em processos cooperativos na Web; (iii) implementação e validação parcial das propostas, em urna aplicação real no domĂ­nio de planejamento agrĂ­cola, analisando os benefĂ­cios e as limitaçÔes de eficiĂȘncia e escalabilidade da tecnologia atual da Web semĂąntica, face a grandes volumes de dadosAbstract: POESIA (Processes for Open-Ended Systems for Information Analysis), the approach proposed in this work, supports the construction of complex processes that involve the integration and analysis of data from several sources, particularly in scientific applications. This approach is centered in two types of semantic Web mechanisms: scientific workflows, to specify and compose Web services; and domain ontologies, to enable semantic interoperability and management of data and processes. The main contributions of this thesis are: (i) a theoretical framework to describe, discover and compose data and services on the Web, inc1uding mIes to check the semantic consistency of resource compositions; (ii) ontology-based methods to help data integration and estimate data provenance in cooperative processes on the Web; (iii) partial implementation and validation of the proposal, in a real application for the domain of agricultural planning, analyzing the benefits and scalability problems of the current semantic Web technology, when faced with large volumes of dataDoutoradoCiĂȘncia da ComputaçãoDoutor em CiĂȘncia da Computaçã

    Effectively Maintaining Single View Consistency in Web Warehouses

    Full text link
    Web warehouse provides high availability and efficiency by utilizing materialized webviews, which should be refreshed in time to keep freshness. During the refreshing, the consistency between a webview and its base data, which is formally named single view consistency (abbreviated as SVC), must be guaranteed. Since the base data changes in a web warehousing environment do not propagate from data sources to the information consumers, which is far different from the case in the traditional data warehouses, we must pursue new maintenance methods. In this paper we first introduce the definition for SVC, and then we present an algorithm RCA to keep SVC as well as an effective base data change detection method SAA. We illustrate that RCA and SAA can guarantee SVC and they are effective in the web environment. ? 2005 IEEE.EI

    Enterprise Information Integration Using a Peer to Peer Approach

    Get PDF
    The integration of enterprise information systems has unique requirements and frequently posesproblems to business partners. We discuss specific integration issues for micro-sized enterprises onthe special case of independent sales agencies and their suppliers. We argue that the enterpriseinformation systems of those independent enterprises are technically best represented by equal peers.Therefore, we have designed the Peer-To-Peer (P2P) integration architecture VIANA for theintegration of enterprise information systems. Its architecture provides materializing P2P integrationusing optimistic replication. It is applicable to inter- and intraorganizational integration scenarios. Itis accomplished by the propagation of write operations between peers. We argue that this type ofintegration can be realized with no alteration of the participating information systems

    Estocada: Stockage Hybride et Ré-écriture sous Contraintes d'Intégrité

    Get PDF
    National audienceLa production croissante de donnĂ©es numĂ©riques a conduit a l'ÂŽ emergence d'une grande variĂ©tĂ© de systemes de gestion de donnĂ©es (Data Management Systems, ou DMS). Dans ce contexte, les applications a usage intensif de donnĂ©es ont besoin (i) d' accĂ©der a des donnĂ©es hĂ©tĂ©rogenes de grande taille (" Big Data "), ayant une structure potentiellement complexe, et (ii) de manipuler des donnĂ©es de façon efficace afin de garantir une bonne performance de l'application. Comme ces diffĂ©rents systemes sont spĂ©cialisĂ©s sur certaines opĂ©rations mais sont moins performants sur d'autres, il peut s' avĂ©rer essentiel pour une application d'utiliser plusieurs DMS en mĂȘme temps. Dans ce contexte nous prĂ©sentons Estocada, une application donnant la possibilitĂ© de tirer profit simultanĂ©ment de plusieurs DMSs et permettant une manipulation efficace et automatique de donnĂ©es de grande taille et hĂ©tĂ©rogenes, offrant ainsi un meilleur support aux applications a usage intensif de donnĂ©es. Dans Estocada, les donnĂ©es sont reparties dans plusieurs fragments qui sont stockĂ©s dans diffĂ©rents DMSs. Pour rĂ©pondrĂš a une requĂȘtĂš a partir de ces fragments , Estocada est basĂ© sur la reecriture de requĂȘtes sous contraintes; cesdernĂŹeres sont utilisĂ©es pour reprĂ©senter les diffĂ©rents modeles de donnĂ©es et la rĂ©partition des fragments entre les differents DMSs

    Peer-to-peer systems for simple and flexible information sharing

    Get PDF
    Includes abstract.Includes bibliographical references (leaves 76-80).Peer to peer computing (P2P) is an architecture that enables applications to access shared resources, with peers having similar capabilities and responsibilities. The ubiquity of P2P computing and its increasing adoption for a decentralized data sharing mechanism have fueled my research interests. P2P networks are useful for sharing content files containing audio, video, and data. This research aims to address the problem of simple and flexible access to data from a variety of data sources across peers with different operating systems, databases and hardware. The proposed architecture makes use of SQL queries, web services, heterogeneous database servers and XML data transformation for the peer to peer data sharing prototype. SQL queries and web services provide a data sharing mechanism that allows both simple and flexible data access

    Incremental maintenance of materialized xquery views

    Get PDF
    Keeping views fresh by maintaining the consistency between materialized views and their base data in the presence of base updates is a critical prob-lem for many applications, including data warehousing and data integra-tion. While heavily studied for traditional databases, the maintenance of XML views remains largely unexplored. Maintaining XML views is com-plex due to the richness of the XML data model and the powerful capabili-ties of XML query languages, such as XQuery. This dissertation proposes a comprehensive solution for the general problem of maintaining materialized XQuery views. Our solution is the first to enable the maintenance of a large class of XQuery views including XPath expressions, FLWOR expressions, and Element Constructors. These views may contain arbitrary result construction and arbitrary grouping and join operations. Our solution also supports the unique order requirements of XQuery including source document order and query order. Th

    The multi-agent system architecture in SEWASIE

    Get PDF
    We describe the design, implementation and deployment of the multi-level agent-based system architecture developed for the SEWASIE project. The aim of the system is to help the user in querying heterogeneous data sources which are integrated by means of ontologies. The agent architecture is based on a two level data integration scheme supported by mediators and brokers, connected by a peer to peer mechanism. Implementation is done on top of the JADE system, a modular and scalable platform that satisfies FIPA standards.Facultad de InformĂĄtic

    A Service Late Binding Enabled Solution for Data Integration from Autonomous and Evolving Databases

    Get PDF
    Integrating data from autonomous, distributed and heterogeneous data sources to provide a unified vision is a common demand for many businesses. Since the data sources may evolve frequently to satisfy their own independent business needs, solutions which use hard coded queries to integrate participating databases may cause high maintenance costs when evolution occurs. Thus a new solution which can handle database evolution with lower maintenance effort is required. This thesis presents a new solution: Service Late binding Enabled Data Integration (SLEDI) which is set into a framework modeling the essential processes of the data integration activity. It integrates schematic heterogeneous relational databases with decreased maintenance costs for handling database evolution. An algorithm, named Information Provision Unit Describing (IPUD) is designed to describe each database as a set of Information Provision Units (IPUs). The IPUs are represented as Directed Acyclic Graph (DAG) structured data instead of hard coded queries, and further realized as data services. Hence the data integration is achieved through service invocations. Furthermore, a set of processes is defined to handle the database evolution through automatically identifying and modifying the IPUs which are affected by the evolution. An extensive evaluation based on a case study is presented. The result shows that the schematic heterogeneities defined in this thesis can be solved by IPUD except the relation isomorphism discrepancy. Ten out of thirteen types of schematic database evolution can be automatically handled by the evolution handling processes as long as the evolution is represented by the designed data model. The computational costs of the automatic evolution handling show a slow linear growth with the number of participating databases. Other characteristics addressed include SLEDI’s scalability, independence of application domain and databases model. The descriptive comparison with other data integration approaches shows that although the Data as a Service approach may result in lower performance under some circumstances, it supports better flexibility for integrating data from autonomous and evolving data sources
    • 

    corecore