51 research outputs found
Global Semantic Integrity Constraint Checking for a System of Databases
In today’s emerging information systems, it is natural to have data distributed across multiple sites. We define a System of Databases (SyDb) as a collection of autonomous and heterogeneous databases. R-SyDb (System of Relational Databases) is a restricted form of SyDb, referring to a collection of relational databases, which are independent. Similarly, X-SyDb (System of XML Databases) refers to a collection of XML databases. Global integrity constraints ensure integrity and consistency of data spanning multiple databases. In this dissertation, we present (i) Constraint Checker, a general framework of a mobile agent based approach for checking global constraints on R-SyDb, and (ii) XConstraint Checker, a general framework for checking global XML constraints on X-SyDb. Furthermore, we formalize multiple efficient algorithms for varying semantic integrity constraints involving both arithmetic and aggregate predicates. The algorithms take as input an update statement, list of all global semantic integrity constraints with arithmetic predicates or aggregate predicates and outputs sub-constraints to be executed on remote sites. The algorithms are efficient since (i) constraint check is carried out at compile time, i.e. before executing update statement; hence we save time and resources by avoiding rollbacks, and (ii) the implementation exploits parallelism. We have also implemented a prototype of systems and algorithms for both R-SyDb and X-SyDb. We also present performance evaluations of the system
Towards interoperability in heterogeneous database systems
Distributed heterogeneous databases consist of systems which differ physically and logically, containing different data models and data manipulation languages. Although these databases are independently created and administered they must cooperate and interoperate. Users need to access and manipulate data from several databases and applications may require data from a wide variety of independent databases. Therefore, a new system architecture is required to manipulate and manage distinct and multiple databases, in a transparent way, while preserving their autonomy. This report contains an extensive survey on heterogeneous databases, analysing and comparing the different aspects, concepts and approaches related to the topic. It introduces an architecture to support interoperability among heterogeneous database systems. The architecture avoids the use of a centralised structure to assist in the different phases of the interoperability process. It aims to support scalability, and to assure privacy and nfidentiality of the data. The proposed architecture allows the databases to decide when to participate in the system, what type of data to share and with which other databases, thereby preserving their autonomy. The report also describes an approach to information discovery in the proposed architecture, without using any centralised structure as repositories and dictionaries, and broadcasting to all databases. It attempts to reduce the number of databases searched and to preserve the privacy of the shared data. The main idea is to visit a database that either containsthe requested data or knows about another database that possible contains this data
Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems
To meet users' growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optimization should be performed to achieve good system performance. There are a number of new challenges for global query optimization in an MDBS. Among them, a major one is that some local optimization information, such as local cost parameters, may not be available at the global level because of local autonomy. It creates difficulties for finding a good decomposition of a global query during query optimization. To tackle this challenge, a new query sampling method is proposed in this paper. The idea is to group component queries into homogeneous classes, draw a sample of queries from each class, and use observed costs of sample queries to derive a cost formula for each class by multiple regression. The derived formulas can be used to estimate the cost of a query during query optimization. The relevant issues, such as query classification rules, sampling procedures, and cost model development and validation, are explored in this paper. To verify the feasibility of the method, experiments were conducted on three commercial database management systems supported in an MDBS. Experimental results demonstrate that the proposed method is quite promising in estimating local cost parameters in an MDBS.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/44824/1/10619_2004_Article_181758.pd
A comparative study of transaction management services in multidatabase heterogeneous systems
Multidatabases are being actively researched as a relatively new area in which many aspects are not yet fully understood. This area of transaction management in multidatabase systems still has many unresolved problems. The problem areas which this dissertation addresses are classification of multidatabase systems, global concurrency control, correctness criterion in a multidatabase environment, global deadlock detection, atomic commitment and crash recovery. A core group of research addressing these problems was identified and studied. The dissertation contributes to the multidatabase transaction management topic by introducing an alternative classification method for such multiple database systems; assessing existing research into
transaction management schemes and based on this assessment, proposes a transaction
processing model founded on the optimal properties of transaction management identified during
the course of this research.ComputingM. Sc. (Computer Science
Data queries over heterogeneous sources
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaEnterprises typically have their data spread over many software systems, such as
custom made applications, CRM systems like SalesForce, CMS systems, or ERP systems
like SAP. In these setting, it is often desired to integrate information from many data
sources to accomplish some business goal in an application. Data may be stored locally
or in the cloud in a wide variety of ways, demanding for explicit transformation processes
to be defined, reason why it is hard for developers to integrate it. Moreover, the amount
of external data can be large and the difference of efficiency between a smart and a naive
way of retrieving and filtering data from different locations can be great. Hence, it is
clear that developers would benefit greatly from language abstractions to help them build
queries over heterogeneous data sources and from an optimization process that avoids
large and unnecessary data transfers during the execution of queries.
This project was developed at OutSystems and aims at extending a real product, which
makes it even more challenging. We followed a generic approach that can be implemented
in any framework, not only focused on the product of OutSystems
On Resolving Semantic Heterogeneities and Deriving Constraints in Schema Integration
Ph.DDOCTOR OF PHILOSOPH
- …