330 research outputs found

    Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

    Get PDF
    2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    Efficient rendering of large 3-D and 4-D scalar fields

    Get PDF
    Rendering volumetric data, as a compute/communication intensive and highly parallel application, represents the characteristics of future workloads for desktop computers. Interactively rendering volumetric data has been a challenging problem due to its high computational and communication requirements. With the consistent trend toward high resolution data, it has remained a difficult problem despite the continuous increase in processing power, because of the increasing performance gap between computation and communication. On the other hand, the new multi-core architecture trend in computational units in PC, which can be characterized by parallelism and heterogeneity, provides both opportunities and challenges. While the new on-chip parallel architectures offer opportunities for extremely high performance, widespread use of those parallel processors requires extensive changes in previous algorithms to take advantage of the new architectures. In this dissertation, we develop new methods and techniques to support interactive rendering of large volumetric data. In particular, we present a novel method to layout data on disk for efficiently performing an out-of-core axis-aligned slicing of large multidimensional scalar fields. We also present a new method to efficiently build an out-of-core indexing structure for n-dimensional volumetric data. Then, we describe a streaming model for efficiently implementing volume ray casting on a heterogeneous compute resource environment. We describe how we implement the model on SONY/TOSHIBA/IBM Cell Broadband Engine and on NVIDIA CUDA architecture. Our results show that our out-of-core techniques significantly reduce the communication bandwidth requirements and that our streaming model very effectively makes use of the strengths of those heterogeneous parallel compute resource environment for volume rendering. In all cases, we achieve scalability and load balancing, while hiding memory latency

    Time Series Management Systems:A Survey

    Get PDF
    The collection of time series data increases as more monitoring and automation are being deployed. These deployments range in scale from an Internet of things (IoT) device located in a household to enormous distributed Cyber-Physical Systems (CPSs) producing large volumes of data at high velocity. To store and analyze these vast amounts of data, specialized Time Series Management Systems (TSMSs) have been developed to overcome the limitations of general purpose Database Management Systems (DBMSs) for times series management. In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. Our classification is organized into categories based on the architectures observed during our analysis. In addition, we provide an overview of each system with a focus on the motivational use case that drove the development of the system, the functionality for storage and querying of time series a system implements, the components the system is composed of, and the capabilities of each system with regard to Stream Processing and Approximate Query Processing (AQP). Last, we provide a summary of research directions proposed by other researchers in the field and present our vision for a next generation TSMS.Comment: 20 Pages, 15 Figures, 2 Tables, Accepted for publication in IEEE TKD

    Exploiting spatial and temporal coherence in GPU-based volume rendering

    Full text link
    Effizienz spielt eine wichtige Rolle bei der Darstellung von Volumendaten, selbst wenn leistungsstarke Grafikhardware zur Verfügung steht, da steigende Datensatzgrößen und höhere Anforderungen an Visualisierungstechniken Fortschritte bei Grafikprozessoren ausgleichen. In dieser Dissertation wird untersucht, wie räumliche und zeitliche Kohärenz in Volumendaten zur Optimierung von Volumenrendering genutzt werden kann. Es werden mehrere neue Ansätze für statische und zeitvariante Daten eingeführt, die verschieden Arten von Kohärenz in verschiedenen Stufen der Volumenrendering-Pipeline ausnutzen. Zu den vorgestellten Beschleunigungstechniken gehört Empty Space Skipping mittels Occlusion Frustums, eine auf Slabs basierende Cachestruktur für Raycasting und ein verlustfreies Kompressionsscheme für zeitvariante Daten. Die Algorithmen wurden zur Verwendung mit GPU-basiertem Volumen-Raycasting entworfen und nutzen die Fähigkeiten moderner Grafikprozessoren, insbesondere Stream Processing. Efficiency is a key aspect in volume rendering, even if powerful graphics hardware is employed, since increasing data set sizes and growing demands on visualization techniques outweigh improvements in graphics processor performance. This dissertation examines how spatial and temporal coherence in volume data can be used to optimize volume rendering. Several new approaches for static as well as for time-varying data sets are introduced, which exploit different types of coherence in different stages of the volume rendering pipeline. The presented acceleration algorithms include empty space skipping using occlusion frustums, a slab-based cache structure for raycasting, and a lossless compression scheme for time-varying data. The algorithms were designed for use with GPU-based volume raycasting and to efficiently exploit the features of modern graphics processors, especially stream processing

    Model-Based Time Series Management at Scale

    Get PDF

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

    Get PDF
    Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals
    • …
    corecore