Efficient management of large multidimensional datasets has attracted much attention
in the database research community. Such large multidimensional datasets are common
and efficient algorithms are needed for analyzing these data sets for a variety of applications.
In this thesis, we focus our study on two very common classes of analysis: similarity
and skyline summarization. We first focus on similarity when one of the dimensions in the
multidimensional dataset is temporal. We then develop algorithms for evaluating skyline
summaries effectively for both temporal and low-cardinality attribute domain datasets and
propose different methods for improving the effectiveness of the skyline summary operation.
This thesis begins by studying similarity measures for time-series datasets and efficient
algorithms for time-series similarity evaluation. The first contribution of this thesis is
a new algorithm which can be
used to evaluate similarity methods whose matching criteria is bounded by a specified
threshold value.
The second contribution of this thesis is the development of a new time-interval skyline
operator, which continuously computes the current skyline over a data stream. We present
a new algorithm called LookOut for evaluating such queries efficiently, and empirically
demonstrate the scalability of this algorithm.
Current skyline evaluation techniques follow a common paradigm that eliminates data
elements from skyline consideration by finding other elements in the dataset that dominate
them. The performance of such techniques is heavily influenced by the underlying data
distribution. The third contribution of this thesis is a novel technique called the Lattice
Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on
datasets with attributes that are drawn from low-cardinality domains.
The utility of the skyline as a data summarization technique is often diminished by the
volume of points in the skyline The final contribution of this thesis is a novel scheme
which remedies the skyline volume problem by
ranking the elements of the skyline based on their importance to the skyline summary.
Collectively, the techniques described in this thesis present efficient methods for two
common and computationally intensive analysis operations on large multidimensional
datasets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57643/2/mmorse_1.pd