53 research outputs found

    UPI: A Primary Index for Uncertain Databases

    Get PDF
    Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (Uncertain Primary Index) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions. Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI. We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file.National Science Foundation (U.S.) (Grant IIS-0448124)National Science Foundation (U.S.) (Grant IIS-0905553)National Science Foundation (U.S.) (Grant IIS-0916691

    CORADD: Correlation Aware Database Designer for Materialized Views and Indexes

    Get PDF
    We describe an automatic database design tool that exploits correlations between attributes when recommending materialized views (MVs) and indexes. Although there is a substantial body of related work exploring how to select an appropriate set of MVs and indexes for a given workload, none of this work has explored the effect of correlated attributes (e.g., attributes encoding related geographic information) on designs. Our tool identifies a set of MVs and secondary indexes such that correlations between the clustered attributes of the MVs and the secondary indexes are enhanced, which can dramatically improve query performance. It uses a form of Integer Linear Programming (ILP) called ILP Feedback to pick the best set of MVs and indexes for given database size constraints. We compare our tool with a state-of-the-art commercial database designer on two workloads, APB-1 and SSB (Star Schema Benchmark---similar to TPC-H). Our results show that a correlation-aware database designer can improve query performance up to 6 times within the same space budget when compared to a commercial database designer.National Science Foundation (U.S.) (Grant IIS-0704424)SAP Corporation (Grant

    An Efficient Scheme for Dynamic Data Replication

    No full text
    This paper presents an efficient scheme for dynamic replication of data in distributed environments. The aim of the scheme is to increase system performance by intelligent data placement so as to optimize the message traffic in the network. Research in the recent past has comparatively focussed very little on using replication for increasing performance but has instead been directed more at improving system availability through replication. However, with the advent of mobile or nomadic computing, research in replication needs to change direction-- the underlying assumption of high speed networks no longer hold true. Wireless networks not only have lower bandwidth but are also very expensive to use. In such an environment, it is imperative that data be distributed intelligently to achieve a good system performance in terms of message costs and turnaround time. Besides, with mobility introduced in the system, earlier static schemes for improving performance (e.g., the File Alloc..

    Rule Languages and Internal Algebras for Rule-Based Optimizers

    No full text
    Rule-based optimizers and optimizer generators use rules to specify query transformations. Rules act directly on query representations, which typically are based on query algebras. But most algebras complicate rule formulation, and rules over these algebras must often resort to calling to externally defined bodies of code. Code makes rules difficult to formulate, prove correct and reason about, and therefore compromises the effectiveness of rule-based systems. In this paper we present KOLA; a combinator-based algebra designed to simplify rule formulation. KOLA is not a user language, and KOLA's variable-free queries are difficult for humans to read. But KOLA is an effective internal algebra because its combinatorstyle makes queries manipulable and structurally revealing. As a result, rules over KOLA queries are easily expressed without the need for supplemental code. We illustrate this point, first by showing some transformations that despite their simplicity, require head and body rou..

    Issues in the design of object-oriented database programming languages

    No full text
    We see a trend toward extending object-oriented languages in the direction of databases, and, at the same time, toward extending database systems with object-oriented ideas. On the surface, these two activities seem to be moving in a consistent direction. However, at a deeper level, we see dif-ficulties that may inhibit their ending up at the same point. We feel that many of these difficufties are a result of the underlying assumptions that are inherent in the fields of programming language and database systems research. Many of these assumptions are historical and contribute to a set of cultural biases that often prevent the two communities from interacting as effectively as pos-sible. The purpose of this paper is to try to uncover some of the cultural presuppositions that have inhibited development of a fully integrated database programming language. We have identified database and language features that seem to be difficult to reconcile. We try to uncover the basic problems in these two areas that these features were intended to solve. In order to resolve these problems, we attempt to distinguish fundamental differences from historical artifacts. 1. lntroductlon The database and the programming language communities seem to be moving toward each other in terms of the problems that they are addressing. Database systems have been attempting to increase their power by associating more and mor
    • …
    corecore