51 research outputs found

    Global Semantic Integrity Constraint Checking for a System of Databases

    Get PDF
    In today’s emerging information systems, it is natural to have data distributed across multiple sites. We define a System of Databases (SyDb) as a collection of autonomous and heterogeneous databases. R-SyDb (System of Relational Databases) is a restricted form of SyDb, referring to a collection of relational databases, which are independent. Similarly, X-SyDb (System of XML Databases) refers to a collection of XML databases. Global integrity constraints ensure integrity and consistency of data spanning multiple databases. In this dissertation, we present (i) Constraint Checker, a general framework of a mobile agent based approach for checking global constraints on R-SyDb, and (ii) XConstraint Checker, a general framework for checking global XML constraints on X-SyDb. Furthermore, we formalize multiple efficient algorithms for varying semantic integrity constraints involving both arithmetic and aggregate predicates. The algorithms take as input an update statement, list of all global semantic integrity constraints with arithmetic predicates or aggregate predicates and outputs sub-constraints to be executed on remote sites. The algorithms are efficient since (i) constraint check is carried out at compile time, i.e. before executing update statement; hence we save time and resources by avoiding rollbacks, and (ii) the implementation exploits parallelism. We have also implemented a prototype of systems and algorithms for both R-SyDb and X-SyDb. We also present performance evaluations of the system

    Towards interoperability in heterogeneous database systems

    Get PDF
    Distributed heterogeneous databases consist of systems which differ physically and logically, containing different data models and data manipulation languages. Although these databases are independently created and administered they must cooperate and interoperate. Users need to access and manipulate data from several databases and applications may require data from a wide variety of independent databases. Therefore, a new system architecture is required to manipulate and manage distinct and multiple databases, in a transparent way, while preserving their autonomy. This report contains an extensive survey on heterogeneous databases, analysing and comparing the different aspects, concepts and approaches related to the topic. It introduces an architecture to support interoperability among heterogeneous database systems. The architecture avoids the use of a centralised structure to assist in the different phases of the interoperability process. It aims to support scalability, and to assure privacy and nfidentiality of the data. The proposed architecture allows the databases to decide when to participate in the system, what type of data to share and with which other databases, thereby preserving their autonomy. The report also describes an approach to information discovery in the proposed architecture, without using any centralised structure as repositories and dictionaries, and broadcasting to all databases. It attempts to reduce the number of databases searched and to preserve the privacy of the shared data. The main idea is to visit a database that either containsthe requested data or knows about another database that possible contains this data

    Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems

    Full text link
    To meet users' growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optimization should be performed to achieve good system performance. There are a number of new challenges for global query optimization in an MDBS. Among them, a major one is that some local optimization information, such as local cost parameters, may not be available at the global level because of local autonomy. It creates difficulties for finding a good decomposition of a global query during query optimization. To tackle this challenge, a new query sampling method is proposed in this paper. The idea is to group component queries into homogeneous classes, draw a sample of queries from each class, and use observed costs of sample queries to derive a cost formula for each class by multiple regression. The derived formulas can be used to estimate the cost of a query during query optimization. The relevant issues, such as query classification rules, sampling procedures, and cost model development and validation, are explored in this paper. To verify the feasibility of the method, experiments were conducted on three commercial database management systems supported in an MDBS. Experimental results demonstrate that the proposed method is quite promising in estimating local cost parameters in an MDBS.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/44824/1/10619_2004_Article_181758.pd

    A comparative study of transaction management services in multidatabase heterogeneous systems

    Get PDF
    Multidatabases are being actively researched as a relatively new area in which many aspects are not yet fully understood. This area of transaction management in multidatabase systems still has many unresolved problems. The problem areas which this dissertation addresses are classification of multidatabase systems, global concurrency control, correctness criterion in a multidatabase environment, global deadlock detection, atomic commitment and crash recovery. A core group of research addressing these problems was identified and studied. The dissertation contributes to the multidatabase transaction management topic by introducing an alternative classification method for such multiple database systems; assessing existing research into transaction management schemes and based on this assessment, proposes a transaction processing model founded on the optimal properties of transaction management identified during the course of this research.ComputingM. Sc. (Computer Science

    Data queries over heterogeneous sources

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaEnterprises typically have their data spread over many software systems, such as custom made applications, CRM systems like SalesForce, CMS systems, or ERP systems like SAP. In these setting, it is often desired to integrate information from many data sources to accomplish some business goal in an application. Data may be stored locally or in the cloud in a wide variety of ways, demanding for explicit transformation processes to be defined, reason why it is hard for developers to integrate it. Moreover, the amount of external data can be large and the difference of efficiency between a smart and a naive way of retrieving and filtering data from different locations can be great. Hence, it is clear that developers would benefit greatly from language abstractions to help them build queries over heterogeneous data sources and from an optimization process that avoids large and unnecessary data transfers during the execution of queries. This project was developed at OutSystems and aims at extending a real product, which makes it even more challenging. We followed a generic approach that can be implemented in any framework, not only focused on the product of OutSystems

    On Resolving Semantic Heterogeneities and Deriving Constraints in Schema Integration

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore