13 research outputs found

    Data mining techniques using decision tree model in materialised projection and selection view

    Get PDF
    With the availability of very large data storage today, redundant data structures are no longer a big issue. However, an intelligent way of managing materialised projection and selection views that can lead to fast access of data is the central issue dealt with in this paper. A set of implementation steps for the data warehouse administrators or decision makers to improve the response time of queries is also defined. The study concludes that both attributes and tuples, are important factors to be considered to improve the response time of a query. The adoption of data mining techniques in the physical design of data warehouses has been shown to be useful in practice

    Introducing Perf to a Query Optimization Algorithm

    Get PDF

    Optimizing Queries Using a Materialized View in a Data Warehoue

    Get PDF
    A data warehouse is a user-centered environment for data analysis and decision support. To support decision maker in making decisions quickly and accurately, using materialized views can provide significant improvements in query processing time. The problem of answering queries using views is to find efficient methods of answering a query using a set of previously materialized views over the database, rather than accessing the database relations. The known algorithms, the bucket algorithm, the inverse-rules algorithm have been used to rewrite queries using views before executing the queries. The bucket algorithm, predominantly used to rewrite queries, generates a candidate rewriting to a query using views then checks that the rewriting is contained in the original query. However, we show same deficiencies in the bucket algorithm then describe the containment bucket algorithm and give an optimal method to solve this problem. We present an experiment comparing the performance of both algorithms.Computer Science Departmen

    GhostDB: Querying Visible and Hidden Data Without Leaks

    Get PDF
    International audienceImagine that you have been entrusted with private data, such as corporate product information, sensitive government information, or symptom and treatment information about hospital patients. You may want to issue queries whose result will combine private and public data, but private data must not be revealed. GhostDB is an architecture and system to achieve this. You carry private data in a smart USB key (a large Flash persistent store combined with a tamper and snoop-resistant CPU and small RAM). When the key is plugged in, you can issue queries that link private and public data and be sure that the only information revealed to a potential spy is which queries you pose. Queries linking public and private data entail novel distributed processing techniques on extremely unequal devices (standard computer and smart USB key). This paper presents the basic framework to make this all work intuitively and efficiently

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Index-based Join Operations in Hive

    Get PDF
    ABSTRACT INDEX-BASED JOIN OPERATIONS IN HIVE MAHSA MOFIDPOOR The exponential growth of data being generated, manipulated, analyzed, and archived nowadays introduces new challenges and opportunities for dealing with the so called big data. Hive is a batch-oriented big data software, well suited for query processing and data analysis. Originally developed by Facebook in 2009 and now under the Apache Software Foundation, Hive is gaining popularity for its SQL like query language HiveQL and for supporting majority of the SQL operations in relational database management systems (RDBMS). Being the expensive operation in RDBMS, join has been the focus of many query optimization techniques to improve performance of database systems. We investigate such techniques for join operations in Hive and develop an index-based join algorithm for queries in HiveQL. When a query requires only a small subset of data selected by a predicate in the WHERE clause, the brute-force method which scans the entire tables results in poor performance for redundant disk I/Os, and irrelevant maps initiation in case the query is issued using the mapreduce. In this work, we implement the proposed index-based technique and integrate it in Hive. To add our extension, we obtain Hive architecture details by reverse engineering the code and map our design to the conceptual optimization flow.To evaluate the performance, after setting up the environment, we run relevant test queries on datasets generated using the industry standard benchmark, TPC-H. Our results indicate significant performance gain over relatively large data or highly selective queries

    Compilation and Code Optimization for Data Analytics

    Get PDF
    The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code
    corecore