15,116 research outputs found
Onion Curve: A Space Filling Curve with Near-Optimal Clustering
Space filling curves (SFCs) are widely used in the design of indexes for
spatial and temporal data. Clustering is a key metric for an SFC, that measures
how well the curve preserves locality in moving from higher dimensions to a
single dimension. We present the {\em onion curve}, an SFC whose clustering
performance is provably close to optimal for the cube and near-cube shaped
query sets, irrespective of the side length of the query. We show that in
contrast, the clustering performance of the widely used Hilbert curve can be
far from optimal, even for cube-shaped queries. Since the clustering
performance of an SFC is critical to the efficiency of multi-dimensional
indexes based on the SFC, the onion curve can deliver improved performance for
data structures involving multi-dimensional data.Comment: The short version is published in ICDE 1
Scalability analysis of declustering methods for multidimensional range queries
Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions
QUASII: QUery-Aware Spatial Incremental Index.
With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes
Signature Files: An Integrated Access Method for Formatted and Unformatted Databases
The signature file approach is one of the most powerful information storage and retrieval techniques which is used for finding the data objects that are relevant to the user queries. The main idea of all signature based schemes is to reflect the essence of the data items into bit pattern (descriptors or signatures) and store them in a separate file which acts as a filter to eliminate the non aualifvine data items for an information reauest. It provides an integrated access method for both formattid and formatted databases. A complative
overview and discussion of the proposed signatnre generation methods and the major signature file organization schemes are presented. Applications of the signature techniques to formatted and unformatted databases, single and multiterm query cases, serial and paratlei architecture. static and dynamic environments are provided with a special emphasis on the multimedia databases where the pioneering prototype systems
using signatnres yield highly encouraging results
Metabolic and Chaperone Gene Loss Marks the Origin of Animals: Evidence for Hsp104 and Hsp78 Sharing Mitochondrial Clients
The evolution of animals involved acquisition of an emergent gene repertoire
for gastrulation. Whether loss of genes also co-evolved with this developmental
reprogramming has not yet been addressed. Here, we identify twenty-four genetic
functions that are retained in fungi and choanoflagellates but undetectable in
animals. These lost genes encode: (i) sixteen distinct biosynthetic functions;
(ii) the two ancestral eukaryotic ClpB disaggregases, Hsp78 and Hsp104, which
function in the mitochondria and cytosol, respectively; and (iii) six other
assorted functions. We present computational and experimental data that are
consistent with a joint function for the differentially localized ClpB
disaggregases, and with the possibility of a shared client/chaperone
relationship between the mitochondrial Fe/S homoaconitase encoded by the lost
LYS4 gene and the two ClpBs. Our analyses lead to the hypothesis that the
evolution of gastrulation-based multicellularity in animals led to efficient
extraction of nutrients from dietary sources, loss of natural selection for
maintenance of energetically expensive biosynthetic pathways, and subsequent
loss of their attendant ClpB chaperones.Comment: This is a reformatted version from the recent official publication in
PLoS ONE (2015). This version differs substantially from first three arXiV
versions. This version uses a fixed-width font for DNA sequences as was done
in the earlier arXiv versions but which is missing in the official PLoS ONE
publication. The title has also been shortened slightly from the official
publicatio
- …