305 research outputs found
Query Workload-Aware Index Structures for Range Searches in 1D, 2D, and High-Dimensional Spaces
abstract: Most current database management systems are optimized for single query execution.
Yet, often, queries come as part of a query workload. Therefore, there is a need
for index structures that can take into consideration existence of multiple queries in a
query workload and efficiently produce accurate results for the entire query workload.
These index structures should be scalable to handle large amounts of data as well as
large query workloads.
The main objective of this dissertation is to create and design scalable index structures
that are optimized for range query workloads. Range queries are an important
type of queries with wide-ranging applications. There are no existing index structures
that are optimized for efficient execution of range query workloads. There are
also unique challenges that need to be addressed for range queries in 1D, 2D, and
high-dimensional spaces. In this work, I introduce novel cost models, index selection
algorithms, and storage mechanisms that can tackle these challenges and efficiently
process a given range query workload in 1D, 2D, and high-dimensional spaces. In particular,
I introduce the index structures, HCS (for 1D spaces), cSHB (for 2D spaces),
and PSLSH (for high-dimensional spaces) that are designed specifically to efficiently
handle range query workload and the unique challenges arising from their respective
spaces. I experimentally show the effectiveness of the above proposed index structures
by comparing with state-of-the-art techniques.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web
If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches
IDEAS-1997-2021-Final-Programs
This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)
Learning Multi-dimensional Indexes
Scanning and filtering over multi-dimensional tables are key operations in
modern analytical database engines. To optimize the performance of these
operations, databases often create clustered indexes over a single dimension or
multi-dimensional indexes such as R-trees, or use complex sort orders (e.g.,
Z-ordering). However, these schemes are often hard to tune and their
performance is inconsistent across different datasets and queries. In this
paper, we introduce Flood, a multi-dimensional in-memory index that
automatically adapts itself to a particular dataset and workload by jointly
optimizing the index structure and data storage. Flood achieves up to three
orders of magnitude faster performance for range scans with predicates than
state-of-the-art multi-dimensional indexes or sort orders on real-world
datasets and workloads. Our work serves as a building block towards an
end-to-end learned database system
The use of alternative data models in data warehousing environments
Data Warehouses are increasing their data volume at an accelerated rate; high disk
space consumption; slow query response time and complex database administration are
common problems in these environments. The lack of a proper data model and an
adequate architecture specifically targeted towards these environments are the root
causes of these problems.
Inefficient management of stored data includes duplicate values at column level and
poor management of data sparsity which derives from a low data density, and affects
the final size of Data Warehouses. It has been demonstrated that the Relational Model
and Relational technology are not the best techniques for managing duplicates and data
sparsity.
The novelty of this research is to compare some data models considering their data
density and their data sparsity management to optimise Data Warehouse environments.
The Binary-Relational, the Associative/Triple Store and the Transrelational models
have been investigated and based on the research results a novel Alternative Data
Warehouse Reference architectural configuration has been defined.
For the Transrelational model, no database implementation existed. Therefore it was
necessary to develop an instantiation of it’s storage mechanism, and as far as could be
determined this is the first public domain instantiation available of the storage
mechanism for the Transrelational model
A Spatio-Temporal Model for the Evaluation of Education Quality in Peru
The role of information and communication technologies in the development of modern societies has continuously increased over the past several decades. In particular, recent unprecedented growth in use of the Internet in many developing countries has been accompanied by greater information access and use. Along with this increased use, there have been significant advances in the development of technologies that can support the management and decision-making functions of decentralized government. However, the amount of data available to administrators and planners is increasing at a faster rate than their ability to use these resources effectively.
A key issue in this context is the storage and retrieval of spatial and temporal data. With static data, a planner or analyst is limited to studying cross-sectional snapshots and has little capability to understand trends or assess the impacts of policies. Education, which is a vital part of the human experience and one of the most important aspects of development, is a spatio-temporal process that demands the capacities to store and analyze spatial distributions and temporal sequences simultaneously. Local planners must not only be able to identify problem areas, but also know if a problem is recent or on-going. They must also be able to identify factors which are causing problems for remediation and, most importantly, to assess the impact of remedial interventions. Internet-based tools that allow for fast and easy on-line exploration of spatio-temporal data will better equip planners for doing all of the above. This thesis presents a spatio-temporal on-line data model using the concept or paradigm of space-time. The thesis demonstrates how such a model can be of use in the development of customized software that addresses the evaluation of early childhood education quality in Peru
An OLAP-GIS System for Numerical-Spatial Problem Solving in Community Health Assessment Analysis
Community health assessment (CHA) professionals who use information technology need a complete system that is capable of supporting numerical-spatial problem solving. On-Line Analytical Processing (OLAP) is a multidimensional data warehouse technique that is commonly used as a decision support system in standard industry. Coupling OLAP with Geospatial Information System (GIS) offers the potential for a very powerful system. For this work, OLAP and GIS were combined to develop the Spatial OLAP Visualization and Analysis Tool (SOVAT) for numerical-spatial problem solving. In addition to the development of this system, this dissertation describes three studies in relation to this work: a usability study, a CHA survey, and a summative evaluation.The purpose of the usability study was to identify human-computer interaction issues. Fifteen participants took part in the study. Three participants per round used the system to complete typical numerical-spatial tasks. Objective and subjective results were analyzed after each round and system modifications were implemented. The result of this study was a novel OLAP-GIS system streamlined for the purposes of numerical-spatial problem solving.The online CHA survey aimed to identify the information technology currently used for numerical-spatial problem solving. The survey was sent to CHA professionals and allowed for them to record the individual technologies they used during specific steps of a numerical-spatial routine. In total, 27 participants completed the survey. Results favored SPSS for numerical-related steps and GIS for spatial-related steps.Next, a summative within-subjects crossover design compared SOVAT to the combined use of SPSS and GIS (termed SPSS-GIS) for numerical-spatial problem solving. Twelve individuals from the health sciences at the University of Pittsburgh participated. Half were randomly selected to use SOVAT first, while the other half used SPSS-GIS first. In the second session, they used the alternate application. Objective and subjective results favored SOVAT over SPSS-GIS. Inferential statistics were analyzed using linear mixed model analysis. At the .01 level, SOVAT was statistically significant from SPSS-GIS for satisfaction and time (p < .002).The results demonstrate the potential for OLAP-GIS in CHA analysis. Future work will explore the impact of an OLAP-GIS system in other areas of public health
Sorting improves word-aligned bitmap indexes
Bitmap indexes must be compressed to reduce input/output costs and minimize
CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use
techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid
(WAH) compression. These techniques are sensitive to the order of the rows: a
simple lexicographical sort can divide the index size by 9 and make indexes
several times faster. We investigate row-reordering heuristics. Simply
permuting the columns of the table can increase the sorting efficiency by 40%.
Secondary contributions include efficient algorithms to construct and aggregate
bitmaps. The effect of word length is also reviewed by constructing 16-bit,
32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are
slightly faster than 32-bit indexes despite being nearly twice as large
- …