5 research outputs found

    Efficient Indexing and Query Processing of Model-View Sensor Data in the Cloud

    Get PDF
    As the number of sensors that pervade our lives increases (e.g., environmental sensors, phone sensors, etc.), the efficient management of massive amount of sensor data is becoming increasingly important. The infinite nature of sensor data poses a serious challenge for query processing even in a cloud infrastructure. Traditional raw sensor data management systems based on relational databases lack scalability to accommodate large-scale sensor data efficiently. Thus, distributed key-value stores in the cloud are becoming a prime tool to manage sensor data. Model-view sensor data management, which stores the sensor data in the form of modeled segments, brings the additional advantages of data compression and value interpolation. However, currently there are no techniques for indexing and/or query optimization of the model-view sensor data in the cloud; full table scan is needed for query processing in the worst case. In this paper, we propose an innovative index for modeled segments in key-value stores, namely KVI-index. KVI-index consists of two interval indices on the time and sensor value dimensions respectively, each of which has an in-memory search tree and a secondary list materialized in the key-value store. Then, we introduce a KVI-index–Scan–MapReduce hybrid approach to perform efficient query processing upon modeled data streams. As proved by a series of experiments at a private cloud infrastructure, our approach outperforms in query-response time and index-updating efficiency both Hadoop-based parallel processing of the raw sensor data and multiple alternative indexing approaches of model-view data

    Distributed Time Series Analytics

    Get PDF
    In recent years time series data has become ubiquitous thanks to affordable sensors and advances in embedded technology. Large amount of time-series data are continuously produced in a wide spectrum of applications, such as sensor networks, medical monitoring and so on. Availability of such large scale time series data highlights the importance of of scalable data management, efï¬cient querying and analysis. Meanwhile, in the online setting time series carries invaluable information and knowledge about the real-time status of involved entities or monitored phenomena, which calls for online time series data mining for serving timely decision making or event detection. In this thesis we aim to address these important issues pertaining to scalable and distributed analytics techniques for massive time series data. Concretely, this thesis is centered around the following three topics: As the number of sensors that pervade our lives signiï¬cantly increases (e.g., environmental sensors, mobile phone sensors, IoT applications, etc.), the efï¬cient management of massive amount of time series from such sensors is becoming increasingly important. The inï¬nite nature of sensor data poses a serious challenge for query processing even in a cloud infrastructure. Traditional raw sensor data management systems based on relational databases lack scalability to accommodate large scale sensor data efï¬ciently. Thus, distributed key-value stores in the cloud are becoming a prime tool to manage sensor data. However, currently there are no techniques for indexing and/or query optimization of the model-view sensor time series data in the cloud. In Chapter 2, we propose an innovative index for modeled segments in key-value stores, namely KVI-index. KVI-index consists of two interval indices on the time and sensor value dimensions respectively, each of which has an in-memory search tree and a secondary list materialized in the key-value store. The dramatic increase in the availability of data streams fuels the development of many distributed real-time computation engines (e.g., Storm, Samza, Spark Streaming, S4 etc.). In Chapter 3, we focus on a fundamental time series mining task in such a new computation paradigm, namely continuously mining dynamic (lagged) correlations in time series via a distributed real-time computation engine. Correlations reveal the hidden and temporal interactions across time series and are widely used in scientiï¬c data analysis, data-driven event detection, ï¬nance markets and so on. We propose the P2H framework consisting of a parallelism-partitioning based data shufï¬ing and a hypercube structure based computation pruning method, so as to enhance both the communication and computation efï¬ciency for mining correlations in the distributed context. In numerous real-world applications large datasets collected from observations and measurements of physical entities are inevitably noisy and contain outliers. The outliers in such large and noisy datasets can dramatically degrade the performance of standard distributed machine learning approaches such as s regression trees. In Chapter 4 we present a novel distributed regression tree approach that utilizes robust regression statistics, statistics that are more robust to outliers, for handling large and noisy datasets. Then we present an adaptive gradient learning method for recurrent neural networks (RNN) to forecast streaming time series in the presence of both outliers and change points

    Object-Relational Indexing for General Interval Relationships

    No full text
    . Intervals represent a fundamental data type for temporal, scientific, and spatial databases where time stamps and point data are extended to time spans and range data, respectively. For OLTP and OLAP applications on large amounts of data, not only intersection queries have to be processed efficiently but also general interval relationships including before, meets, overlaps, starts, finishes, contains, equals, during, startedBy, finishedBy, overlappedBy, metBy, and after. Our new algorithms use the Relational Interval Tree, a purely SQL-based and objectrelationally wrapped index structure. The technique therefore preserves the industrial strength of the underlying RDBMS including stability, transactions, and performance. The efficiency of our approach is demonstrated by an experimental evaluation on a real weblog data set containing one million sessions.

    Proc. 7th Int’l Symposium on Spatial and Temporal Databases (SSTD‘01). Lecture Notes in Computer Science, © Springer Verlag. 2001 Object-Relational Indexing for General Interval Relationships

    No full text
    Abstract. Intervals represent a fundamental data type for temporal, scientific, and spatial databases where time stamps and point data are extended to time spans and range data, respectively. For OLTP and OLAP applications on large amounts of data, not only intersection queries have to be processed efficiently but also general interval relationships including before, meets, overlaps, starts, finishes, contains, equals, during, startedBy, finishedBy, overlappedBy, metBy, and after. Our new algorithms use the Relational Interval Tree, a purely SQL-based and objectrelationally wrapped index structure. The technique therefore preserves the industrial strength of the underlying RDBMS including stability, transactions, and performance. The efficiency of our approach is demonstrated by an experimental evaluation on a real weblog data set containing one million sessions.
    corecore