109 research outputs found

    Semantic interoperability through context interchange : representing and reasoning about data conflicts in heterogeneous and autonomous systems

    Get PDF
    Cover title.Includes bibliographical references (p. 24-25).Supported in part by ARPA, International Financial Services Research Center (IFSRC), PROductivity From Information Technology (PROFIT), National University of Singapore, and USAF/Rome Laboratory. F30602-93-C-0160Cheng Hian Goh, Stuart E. Madnick, Michael D. Siegel

    Myriad: Design and implementation of a federated database prototype

    Get PDF
    This paper describes our experiences in the design and implementation of Myriad, and in the project management. Special emphasis is given to discussing design alternatives and their impact on Myriad. This paper also presents the software engineering principles and the project management techniques we used in developing Myriad and the lessons we learned. We believe these lessons would be useful for practitioners who wish to develop a similar syste

    A database management capability for Ada

    Get PDF
    The data requirements of mission critical defense systems have been increasing dramatically. Command and control, intelligence, logistics, and even weapons systems are being required to integrate, process, and share ever increasing volumes of information. To meet this need, systems are now being specified that incorporate data base management subsystems for handling storage and retrieval of information. It is expected that a large number of the next generation of mission critical systems will contain embedded data base management systems. Since the use of Ada has been mandated for most of these systems, it is important to address the issues of providing data base management capabilities that can be closely coupled with Ada. A comprehensive distributed data base management project has been investigated. The key deliverables of this project are three closely related prototype systems implemented in Ada. These three systems are discussed

    Schema Management for Data Integration: A Short Survey

    Get PDF
    Schema management is a basic problem in many database application domains such as data integration systems. Users need to access and manipulate data from several databases. In this context, in order to integrate data from distributed heterogeneous database sources, data integration systems demand the resolution of several issues that arise in managing schemas. In this paper, we present a brief survey of the problem of schema matching which is used for solving problems of schema integration processing. Moreover, we propose a technique for integrating and querying distributed heterogeneous XML schemas.

    From Databases to Information Systems

    Get PDF
    Research and business is currently moving from centralized databases towards information systems integrating distributed and autonomous data sources. Simultaneously, it is a well acknowledged fact that consideration of information quality_IQreasoning _is an important issue for large-scale integrated information systems. We show that IQ-reasoning can be the driving force of the current shift from databases to integrated information systems. In this paper, we explore the implications and consequences of this shift. All areas of answering user queries are affected – from user input, to query planning and query optimization, and finally to building the query result. The application of IQ-reasoning brings both challenges, such as new cost models for optimization, and opportunities, such as improved query planning. We highlight several emerging aspects and suggest solutions toward a pervasion of information quality in information systems.Peer Reviewe

    Segmenting and labeling query sequences in a multidatabase environment

    Get PDF
    When gathering information from multiple independent data sources, users will generally pose a sequence of queries to each source, combine (union) or cross-reference (join) the results in order to obtain the information they need. Furthermore, when gathering information, there is a fair bit of trial and error involved, where queries are recursively refined according to the results of a previous query in the sequence. From the point of view of an outside observer, the aim of such a sequence of queries may not be immediately obvious. We investigate the problem of isolating and characterizing subsequences representing coherent information retrieval goals out of a sequence of queries sent by a user to different data sources over a period of time. The problem has two sub-problems: segmenting the sequence into subsequences, each representing a discrete goal; and labeling each query in these subsequences according to how they contribute to the goal. We propose a method in which a discriminative probabilistic model (a Conditional Random Field) is trained with pre-labeled sequences. We have tested the accuracy with which such a model can infer labels and segmentation on novel sequences. Results show that the approach is very accurate (> 95% accuracy) when there are no spurious queries in the sequence and moderately accurate even in the presence of substantial noise (∼70% accuracy when 15% of queries in the sequence are spurious). © 2011 Springer-Verlag

    Query Optimization for Database Federation Systems

    Get PDF
    Database federation is one approach to data integration, in which a middleware, called mediator, provides uniform access to a number of heterogeneous data sources. In this thesis, we focus on the query optimization for distributed joins over database federation. One important observation in query optimization over distributed database system is that run-time conditions (namely available buffer size, CPU utilization in machine and network environment) can significantly affect the execution cost of a query plan. However, in existing database federation systems, very few studies have addressed run-time conditions. It is a challenging problem, because usually the mediator is not able to know the run-time conditions of remote sites and considering run-time conditions will bring about extra complexity to the optimizer. This thesis proposes the Cluster-and-Conquer algorithm for query optimization over database federation while efficiently considering run-time conditions. This algorithm has three-fold benefits. Firstly, the run-time conditions of machines are now available for cluster mediator. Secondly, each cluster mediator can deal with its own sub query concurrently, so the complexity of processing query plan is decreased. Thirdly, the algorithm outperforms other related approaches in terms of“cost of costing , because it removes unnecessary inter-cluster operations in the early stage. I have implemented a prototype data federation system with Cluster-and-Conquer algorithm. The experimental results showed the capabilities and efficiency of our algorithm and described the target scenarios where the algorithm performs better than other related approaches

    EasyBDI: integração automática de big data e consultas analíticas de alto nível

    Get PDF
    Abstract The emergence of new areas, such as the internet of things, which require access to the latest data for data analytics and decision-making environments, created constraints for the execution of analytical queries on traditional data warehouse architectures. In addition, the increase of semi-structure and unstructured data led to the creation of new databases to deal with these types of data, namely, NoSQL databases. This led to the information being stored in several different systems, each with more suitable characteristics for different use cases, which created difficulties in accessing data that are now spread across various systems with different models and characteristics. In this work, a system capable of performing analytical queries in real time on distributed and heterogeneous data sources is proposed: EasyBDI. The system is capable of integrating data logically, without materializing data, creating an overview of the data, thus offering an abstraction over the distribution and heterogeneity of data sources. Queries are executed interactively on data sources, which means that the most recent data will always be used in queries. This system presents a user interface that helps in the configuration of data sources, and automatically proposes a global schema that presents a generic and simplified view of the data, which can be modified by the user. The system allows the creation of multiple star schemas from the global schema. Finally, analytical queries are also made through a user interface that uses drag-and-drop elements. EasyBDI is able to solve recent problems by using recent solutions, hiding the details of several data sources, at the same time that allows users with less knowledge of databases to also be able to perform real-time analytical queries over distributed and heterogeneous data sources.O aparecimento de novas áreas, como a Internet das Coisas, que requerem o acesso aos dados mais recentes para ambientes de tomada de decisão, criou constrangimentos na execução de consultas analíticas usando as arquiteturas tradicionais de data warehouses. Adicionalmente, o aumento de dados semi-estruturados e não estruturados levou a que outras bases de dados fossem criadas para lidar com esse tipo de dados, nomeadamente bases NoSQL. Isto levou a que a informação seja armazenada em sistemas com características distintas e especializados em diferentes casos de uso, criando dificuldades no acesso aos dados que estão agora espalhados por vários sistemas com modelos e características distintas. Neste trabalho, propõe-se um sistema capaz de efetuar consultas analíticas em tempo real sobre fontes de dados distribuídas e heterogéneas: o EasyBDI. O sistema é capaz de integrar dados logicamente, sem materializar os dados, criando uma vista geral dos dados que oferece uma abstração sobre a distribuição e heterogeneidade das fontes de dados. As consultas são executadas interativamente nas fontes de dados, o que significa que os dados mais recentes serão sempre usados nas consultas. Este sistema apresenta uma interface de utilizador que ajuda na configuração de fontes de dados, e propõe automaticamente um esquema global que apresenta a vista genérica e simplificada dos dados, podendo ser modificado pelo utilizador. O sistema permite a criação de múltiplos esquema em estrela a partir do esquema global. Por fim, a realização de consultas analíticas é feita também através de uma interface de utilizador que recorre ao drag-and-drop de elementos. O EasyBDI é capaz de resolver problemas recentes, utilizando também soluções recentes, escondendo os detalhes de diversas fontes de dados, ao mesmo tempo que permite que utilizadores com menos conhecimentos em bases de dados possam também realizar consultas analíticas em tempo-real sobre fontes de dados distribuídas e heterogéneas.Mestrado em Engenharia Informátic
    corecore