    Spatial Data Management Challenges in the Simulation Sciences

    Scientists in many disciplines have progressively been using simulations to better understand the natural systems they study. Faster hardware, as well as increasingly precise instruments, allow the construction and simulation of progressively advanced models of various systems. Governed by algorithms and equations, the spatial models at the core of simulations are changed and updated at every simulation step through spatial queries, implementing massive updates. Therefore, the efficient execution of these numerous spatial queries is essential. Two reasons render current spatial indexes inadequate for simulation applications. First, to ensure quick access to data, most of the spatial models in simulations are stored in memory. Most spatial access methods, however, have been optimized for use on disk and are not efficient in memory. Second, in every time step of a simulation, almost all spatial elements change their position, challenging update mechanisms for spatial indexes. In this paper we discuss how these challenges create opportunities for exciting data management research

    Accelerating Spatial Range Queries

    It is increasingly common for domain scientists to use computational tools to build and simulate spatial models of the phenomena they are studying. The spatial models they build are more and more detailed as well as dense and are consequently difficult to manage with today's tools. A crucial problem when analyzing spatial models of increasing detail is the scalable execution of range queries. State-of-the-art approaches like the R-Tree perform suboptimally on today's models and do not scale for more dense, future models. The problem is that the amount of overlap in the tree structure increases as a function of the level of detail/density in the model. In this demonstration we showcase ZOOM, a new tool to efficiently execute spatial range queries on increasingly detailed (denser) models. ZOOM is based on FLAT, a novel range query execution approach that effectively decouples the query execution time from the density of the dataset, thereby ensuring efficient query execution. At the core of the demonstration thus is the visualization of the novel query execution strategy of FLAT which we contrast with a visualization of the query execution of the R-Tree

    RUBIK: Efficient Threshold Queries on Massive Time Series

    An increasing number of applications from finance, meteorology, science and others are producing time series as output. The analysis of the vast amount of time series is key to understand the phenomena studied, particularly in the simulation sciences, where the analysis of time series resulting from simulation allows scientists to refine the model simulated. Existing approaches to query time series typically keep a compact representation in main memory, use it to answer queries approximately and then access the exact time series data on disk to validate the result. The more precise the in-memory representation, the fewer disk accesses are needed to validate the result. With the massive sizes of today's datasets, however, current in-memory representations oftentimes no longer fit into main memory. To make them fit, their precision has to be reduced considerably resulting in substantial disk access which impedes query execution today and limits scalability for even bigger datasets in the future. In this paper we develop RUBIK, a novel approach to compressing and indexing time series. RUBIK exploits that time series in many applications and particularly in the simulation sciences are similar to each other. It compresses similar time series, i.e., observation values as well as time information, achieving better space efficiency and improved precision. RUBIK translates threshold queries into two dimensional spatial queries and efficiently executes them on the compressed time series by exploiting the pruning power of a tree structure to find the result, thereby outperforming the state-of-the-art by a factor of between 6 and 23. As our experiments further indicate, exploiting similarity within and between time series is crucial to make query execution scale and to ultimately decouple query execution time from the growth of the data (size and number of time series)

    BLOCK: Efficient Execution of Spatial Range Queries in Main-Memory

    The execution of spatial range queries is at the core of many applications, particularly in the simulation sciences but also in many other domains. Although main memory in desktop and supercomputers alike has grown considerably in recent years, most spatial indexes supporting the efficient execution of range queries are still only optimized for disk access (minimizing disk page reads). Recent research has primarily focused on the optimization of known disk-based approaches for memory (through cache alignment etc.) but has not fundamentally revisited index structures for memory. In this paper we develop BLOCK, a novel approach to execute range queries on spatial data featuring volumetric objects in main memory. Our approach is built on the key insight that in-memory approaches need to be optimized to reduce the number of intersection tests (between objects and query but also in the index structure). Our experimental results show that BLOCK outperforms known in-memory indexes as well as in-memory implementations of disk-based spatial indexes up to a factor of 7. The experiments show that it is more scalable than competing approaches as the data sets become denser

    Data-driven Neuroscience: Enabling Breakthroughs Via Innovative Data Management

    Scientists in all disciplines increasingly rely on simulations to develop a better understanding of the subject they are studying. For example the neuroscientists we collaborate with in the Blue Brain project have started to simulate the brain on a supercomputer. The level of detail of their models is unprecedented as they model details on the subcellular level (e.g., the neurotransmitter). This level of detail, however, also leads to a true data deluge and the neuroscientists have only few tools to efficiently analyze the data. This demonstration showcases three innovative spatial management solutions that have substantial impact on computational neuroscience and other disciplines in that they allow to build, analyze and simulate bigger and more detailed models. More particularly, we visualize the novel query execution strategy of FLAT, an index for the scalable and efficient execution of range queries on increasingly detailed spatial models. FLAT is used to build and analyze models of the brain. We furthermore demonstrate how SCOUT uses previous query results to prefetch spatial data with high accuracy and therefore speeds up the analysis of spatial models. We finally also demonstrate TOUCH, a novel in-memory spatial join, that speeds up the model building process

    SCOUT: Prefetching for Latent Structure Following Queries

    Today's scientists are quickly moving from in vitro to in silico experimentation: they no longer analyze natural phenomena in a petri dish, but instead they build models and simulate them. Managing and analyzing the massive amounts of data involved in simulations is a major task. Yet, they lack the tools to efficiently work with data of this size. One problem many scientists share is the analysis of the massive spatial models they build. For several types of analysis they need to interactively follow the structures in the spatial model, e.g., the arterial tree, neuron fibers, etc., and issue range queries along the way. Each query takes long to execute, and the total time for executing a sequence of queries significantly delays data analysis. Prefetching the spatial data reduces the response time considerably, but known approaches do not prefetch with high accuracy. We develop SCOUT, a structure-aware method for prefetching data along interactive spatial query sequences. SCOUT uses an approximate graph model of the structures involved in past queries and attempts to identify what particular structure the user follows. Our experiments with neuroscience data show that SCOUT prefetches with an accuracy from 71% to 92%, which translates to a speedup of 4x-15x. SCOUT also improves the prefetching accuracy on datasets from other scientific domains, such as medicine and biology

    OCTOPUS: Efficient Query Execution on Dynamic Mesh Datasets

    Scientists in many disciplines use spatial mesh models to study physical phenomena. Simulating natural phenomena by changing meshes over time helps to understand and predict future behavior of the phenomena. The higher the precision of the mesh models, the more insight do the scientists gain and they thus continuously increase the detail of the meshes and build them as detailed as their instruments and the simulation hardware allow. In the process, the data volume also increases, slowing down the execution of spatial range queries needed to monitor the simulation considerably. Indexing speeds up range query execution, but the overhead to maintain the indexes is considerable because almost the entire mesh changes unpredictably at every simulation step. Using a simple linear scan, on the other hand, requires accessing the entire mesh and the performance deteriorates as the size of the dataset grows. In this paper we propose OCTOPUS, a strategy for executing range queries on mesh datasets that change unpredictably during simulations. In OCTOPUS we use the key insight that the mesh surface along with the mesh connectivity is sufficient to retrieve accurate query results efficiently. With this novel query execution strategy, OCTOPUS minimizes index maintenance cost and reduces query execution time considerably. Our experiments show that OCTOPUS achieves a speedup between 7.2x and 9.2x compared to the state of the art and that it scales better with increasing mesh dataset size and detail

    Reconstruction and simulation of neocortical microcircuitry

    We present a first-draft digital reconstruction of the microcircuitry of somatosensory cortex of juvenile rat. The reconstruction uses cellular and synaptic organizing principles to algorithmically reconstruct detailed anatomy and physiology from sparse experimental data. An objective anatomical method defines a neocortical volume of 0.29 ± 0.01 mm3 containing ∌31,000 neurons, and patch-clamp studies identify 55 layer-specific morphological and 207 morpho-electrical neuron subtypes. When digitally reconstructed neurons are positioned in the volume and synapse formation is restricted to biological bouton densities and numbers of synapses per connection, their overlapping arbors form ∌8 million connections with ∌37 million synapses. Simulations reproduce an array of in vitro and in vivo experiments without parameter tuning. Additionally, we find a spectrum of network states with a sharp transition from synchronous to asynchronous activity, modulated by physiological mechanisms. The spectrum of network states, dynamically reconfigured around this transition, supports diverse information processing strategies