53,673 research outputs found

    The role of expert systems in federated distributed multi-database systems/Ince Levent

    Get PDF
    A shared information system is a series of computer systems interconnected by some kind of communication network. There are data repositories residing on each computer. These data repositories must somehow be integrated. The purpose for using distributed and multi-database systems is to allow users to view collections of data repositories as if they were a single entity. Multidatabase systems, better known as heterogeneous multidatabase systems, are characterized by dissimilar data models, concurrency and optimization strategies and access methods. Unlike homogenous systems, the data models that compose the global database can be based on different types of data models. It is not necessary that all participant databases use the same data model. Federated distributed database systems are a special case of multidatabase systems. They are completely autonomous and do not rely on the global data dictionary to process distributed queries. Processing distributed query requests in federated databases is very difficult since there are multiple independent databases with their own rules for query optimization, deadlock detection, and concurrency. Expert systems can play a role in this type of environment by supplying a knowledge base that contains rules for data object conversion, rules for resolving naming conflicts, and rules for exchanging data.http://archive.org/details/theroleofexperts109459362Turkish Navy author.Approved for public release; distribution is unlimited

    Algebraic optimization of recursive queries

    Get PDF
    Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.\ud \ud In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.\ud \ud The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Statistical structures for internet-scale data management

    Get PDF
    Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

    Extending a multi-set relational algebra to a parallel environment

    Get PDF
    Parallel database systems will very probably be the future for high-performance data-intensive applications. In the past decade, many parallel database systems have been developed, together with many languages and approaches to specify operations in these systems. A common background is still missing, however. This paper proposes an extended relational algebra for this purpose, based on the well-known standard relational algebra. The extended algebra provides both complete database manipulation language features, and data distribution and process allocation primitives to describe parallelism. It is defined in terms of multi-sets of tuples to allow handling of duplicates and to obtain a close connection to the world of high-performance data processing. Due to its algebraic nature, the language is well suited for optimization and parallelization through expression rewriting. The proposed language can be used as a database manipulation language on its own, as has been done in the PRISMA parallel database project, or as a formal basis for other languages, like SQL
    corecore