305 research outputs found

    Query Workload-Aware Index Structures for Range Searches in 1D, 2D, and High-Dimensional Spaces

    Get PDF
    abstract: Most current database management systems are optimized for single query execution. Yet, often, queries come as part of a query workload. Therefore, there is a need for index structures that can take into consideration existence of multiple queries in a query workload and efficiently produce accurate results for the entire query workload. These index structures should be scalable to handle large amounts of data as well as large query workloads. The main objective of this dissertation is to create and design scalable index structures that are optimized for range query workloads. Range queries are an important type of queries with wide-ranging applications. There are no existing index structures that are optimized for efficient execution of range query workloads. There are also unique challenges that need to be addressed for range queries in 1D, 2D, and high-dimensional spaces. In this work, I introduce novel cost models, index selection algorithms, and storage mechanisms that can tackle these challenges and efficiently process a given range query workload in 1D, 2D, and high-dimensional spaces. In particular, I introduce the index structures, HCS (for 1D spaces), cSHB (for 2D spaces), and PSLSH (for high-dimensional spaces) that are designed specifically to efficiently handle range query workload and the unique challenges arising from their respective spaces. I experimentally show the effectiveness of the above proposed index structures by comparing with state-of-the-art techniques.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    Learning Multi-dimensional Indexes

    Full text link
    Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional indexes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsistent across different datasets and queries. In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. Flood achieves up to three orders of magnitude faster performance for range scans with predicates than state-of-the-art multi-dimensional indexes or sort orders on real-world datasets and workloads. Our work serves as a building block towards an end-to-end learned database system

    The use of alternative data models in data warehousing environments

    Get PDF
    Data Warehouses are increasing their data volume at an accelerated rate; high disk space consumption; slow query response time and complex database administration are common problems in these environments. The lack of a proper data model and an adequate architecture specifically targeted towards these environments are the root causes of these problems. Inefficient management of stored data includes duplicate values at column level and poor management of data sparsity which derives from a low data density, and affects the final size of Data Warehouses. It has been demonstrated that the Relational Model and Relational technology are not the best techniques for managing duplicates and data sparsity. The novelty of this research is to compare some data models considering their data density and their data sparsity management to optimise Data Warehouse environments. The Binary-Relational, the Associative/Triple Store and the Transrelational models have been investigated and based on the research results a novel Alternative Data Warehouse Reference architectural configuration has been defined. For the Transrelational model, no database implementation existed. Therefore it was necessary to develop an instantiation of it’s storage mechanism, and as far as could be determined this is the first public domain instantiation available of the storage mechanism for the Transrelational model

    Modeling, Annotating, and Querying Geo-Semantic Data Warehouses

    Get PDF

    A Spatio-Temporal Model for the Evaluation of Education Quality in Peru

    Get PDF
    The role of information and communication technologies in the development of modern societies has continuously increased over the past several decades. In particular, recent unprecedented growth in use of the Internet in many developing countries has been accompanied by greater information access and use. Along with this increased use, there have been significant advances in the development of technologies that can support the management and decision-making functions of decentralized government. However, the amount of data available to administrators and planners is increasing at a faster rate than their ability to use these resources effectively. A key issue in this context is the storage and retrieval of spatial and temporal data. With static data, a planner or analyst is limited to studying cross-sectional snapshots and has little capability to understand trends or assess the impacts of policies. Education, which is a vital part of the human experience and one of the most important aspects of development, is a spatio-temporal process that demands the capacities to store and analyze spatial distributions and temporal sequences simultaneously. Local planners must not only be able to identify problem areas, but also know if a problem is recent or on-going. They must also be able to identify factors which are causing problems for remediation and, most importantly, to assess the impact of remedial interventions. Internet-based tools that allow for fast and easy on-line exploration of spatio-temporal data will better equip planners for doing all of the above. This thesis presents a spatio-temporal on-line data model using the concept or paradigm of space-time. The thesis demonstrates how such a model can be of use in the development of customized software that addresses the evaluation of early childhood education quality in Peru

    An OLAP-GIS System for Numerical-Spatial Problem Solving in Community Health Assessment Analysis

    Get PDF
    Community health assessment (CHA) professionals who use information technology need a complete system that is capable of supporting numerical-spatial problem solving. On-Line Analytical Processing (OLAP) is a multidimensional data warehouse technique that is commonly used as a decision support system in standard industry. Coupling OLAP with Geospatial Information System (GIS) offers the potential for a very powerful system. For this work, OLAP and GIS were combined to develop the Spatial OLAP Visualization and Analysis Tool (SOVAT) for numerical-spatial problem solving. In addition to the development of this system, this dissertation describes three studies in relation to this work: a usability study, a CHA survey, and a summative evaluation.The purpose of the usability study was to identify human-computer interaction issues. Fifteen participants took part in the study. Three participants per round used the system to complete typical numerical-spatial tasks. Objective and subjective results were analyzed after each round and system modifications were implemented. The result of this study was a novel OLAP-GIS system streamlined for the purposes of numerical-spatial problem solving.The online CHA survey aimed to identify the information technology currently used for numerical-spatial problem solving. The survey was sent to CHA professionals and allowed for them to record the individual technologies they used during specific steps of a numerical-spatial routine. In total, 27 participants completed the survey. Results favored SPSS for numerical-related steps and GIS for spatial-related steps.Next, a summative within-subjects crossover design compared SOVAT to the combined use of SPSS and GIS (termed SPSS-GIS) for numerical-spatial problem solving. Twelve individuals from the health sciences at the University of Pittsburgh participated. Half were randomly selected to use SOVAT first, while the other half used SPSS-GIS first. In the second session, they used the alternate application. Objective and subjective results favored SOVAT over SPSS-GIS. Inferential statistics were analyzed using linear mixed model analysis. At the .01 level, SOVAT was statistically significant from SPSS-GIS for satisfaction and time (p < .002).The results demonstrate the potential for OLAP-GIS in CHA analysis. Future work will explore the impact of an OLAP-GIS system in other areas of public health

    Sorting improves word-aligned bitmap indexes

    Get PDF
    Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate row-reordering heuristics. Simply permuting the columns of the table can increase the sorting efficiency by 40%. Secondary contributions include efficient algorithms to construct and aggregate bitmaps. The effect of word length is also reviewed by constructing 16-bit, 32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are slightly faster than 32-bit indexes despite being nearly twice as large
    • …
    corecore