29,481 research outputs found

    QUASII: QUery-Aware Spatial Incremental Index.

    Get PDF
    With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes

    An Algorithm for Data Reorganization in a Multi-dimensional Index

    Get PDF
    In spatial databases, data are associated with spatial coordinates and are retrieved based on spatial proximity. A spatial database uses spatial indexes to optimize spatial queries. An essential ingredient for efficient spatial query processing is spatial clustering of data and reorganization of spatial data. Traditional clustering algorithms and reorganization utilities lack in performance and execution. To solve this problem we have developed an algorithm to convert a two dimensional spatial index into a single dimensional value and then a reorganization is done on the spatial data. This report describes this algorithm as well as various experiments to validate its effectiveness

    Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

    Get PDF
    Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: (a) they have little idle system time to devote on physical design; and (b) there is little, if any, a priori workload knowledge, while the query and data workload keeps changing dynamically. In such environments, traditional approaches to index building and maintenance cannot apply. Database cracking has been proposed as a solution that allows on-the-fly physical data reorganization, as a collateral effect of query processing. Cracking aims to continuously and automatically adapt indexes to the workload at hand, without human intervention. Indexes are built incrementally, adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing methods fail to deliver workload-robustness; they perform much better with random workloads than with others. This frailty derives from the inelasticity with which these approaches interpret each query as a hint on how data should be stored. Current cracking schemes blindly reorganize the data within each query's range, even if that results into successive expensive operations with minimal indexing benefit. In this paper, we introduce stochastic cracking, a significantly more resilient approach to adaptive indexing. Stochastic cracking also uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decision-making. Thereby, we bring adaptive indexing forward to a mature formulation that confers the workload-robustness previous approaches lacked. Our extensive experimental study verifies that stochastic cracking maintains the desired properties of original database cracking while at the same time it performs well with diverse realistic workloads.Comment: VLDB201

    Dynamic Physiological Partitioning on a Shared-nothing Database Cluster

    Full text link
    Traditional DBMS servers are usually over-provisioned for most of their daily workloads and, because they do not show good-enough energy proportionality, waste a lot of energy while underutilized. A cluster of small (wimpy) servers, where its size can be dynamically adjusted to the current workload, offers better energy characteristics for these workloads. Yet, data migration, necessary to balance utilization among the nodes, is a non-trivial and time-consuming task that may consume the energy saved. For this reason, a sophisticated and easy to adjust partitioning scheme fostering dynamic reorganization is needed. In this paper, we adapt a technique originally created for SMP systems, called physiological partitioning, to distribute data among nodes, that allows to easily repartition data without interrupting transactions. We dynamically partition DB tables based on the nodes' utilization and given energy constraints and compare our approach with physical partitioning and logical partitioning methods. To quantify possible energy saving and its conceivable drawback on query runtimes, we evaluate our implementation on an experimental cluster and compare the results w.r.t. performance and energy consumption. Depending on the workload, we can substantially save energy without sacrificing too much performance

    Integration of Exploration and Search: A Case Study of the M3 Model

    Get PDF
    International audienceEffective support for multimedia analytics applications requires exploration and search to be integrated seamlessly into a single interaction model. Media metadata can be seen as defining a multidimensional media space, casting multimedia analytics tasks as exploration, manipulation and augmentation of that space. We present an initial case study of integrating exploration and search within this multidimensional media space. We extend the M3 model, initially proposed as a pure exploration tool, and show that it can be elegantly extended to allow searching within an exploration context and exploring within a search context. We then evaluate the suitability of relational database management systems, as representatives of today’s data management technologies, for implementing the extended M3 model. Based on our results, we finally propose some research directions for scalability of multimedia analytics

    Best-Choice Edge Grafting for Efficient Structure Learning of Markov Random Fields

    Full text link
    Incremental methods for structure learning of pairwise Markov random fields (MRFs), such as grafting, improve scalability by avoiding inference over the entire feature space in each optimization step. Instead, inference is performed over an incrementally grown active set of features. In this paper, we address key computational bottlenecks that current incremental techniques still suffer by introducing best-choice edge grafting, an incremental, structured method that activates edges as groups of features in a streaming setting. The method uses a reservoir of edges that satisfy an activation condition, approximating the search for the optimal edge to activate. It also reorganizes the search space using search-history and structure heuristics. Experiments show a significant speedup for structure learning and a controllable trade-off between the speed and quality of learning

    Coordinated optimization of visual cortical maps : 2. Numerical studies

    Get PDF
    In the juvenile brain, the synaptic architecture of the visual cortex remains in a state of flux for months after the natural onset of vision and the initial emergence of feature selectivity in visual cortical neurons. It is an attractive hypothesis that visual cortical architecture is shaped during this extended period of juvenile plasticity by the coordinated optimization of multiple visual cortical maps such as orientation preference (OP), ocular dominance (OD), spatial frequency, or direction preference. In part (I) of this study we introduced a class of analytically tractable coordinated optimization models and solved representative examples, in which a spatially complex organization of the OP map is induced by interactions between the maps. We found that these solutions near symmetry breaking threshold predict a highly ordered map layout. Here we examine the time course of the convergence towards attractor states and optima of these models. In particular, we determine the timescales on which map optimization takes place and how these timescales can be compared to those of visual cortical development and plasticity. We also assess whether our models exhibit biologically more realistic, spatially irregular solutions at a finite distance from threshold, when the spatial periodicities of the two maps are detuned and when considering more than 2 feature dimensions. We show that, although maps typically undergo substantial rearrangement, no other solutions than pinwheel crystals and stripes dominate in the emerging layouts. Pinwheel crystallization takes place on a rather short timescale and can also occur for detuned wavelengths of different maps. Our numerical results thus support the view that neither minimal energy states nor intermediate transient states of our coordinated optimization models successfully explain the architecture of the visual cortex. We discuss several alternative scenarios that may improve the agreement between model solutions and biological observations
    corecore