15 research outputs found

    A Novel Framework for Online Amnesic Trajectory Compression in Resource-constrained Environments

    Full text link
    State-of-the-art trajectory compression methods usually involve high space-time complexity or yield unsatisfactory compression rates, leading to rapid exhaustion of memory, computation, storage and energy resources. Their ability is commonly limited when operating in a resource-constrained environment especially when the data volume (even when compressed) far exceeds the storage limit. Hence we propose a novel online framework for error-bounded trajectory compression and ageing called the Amnesic Bounded Quadrant System (ABQS), whose core is the Bounded Quadrant System (BQS) algorithm family that includes a normal version (BQS), Fast version (FBQS), and a Progressive version (PBQS). ABQS intelligently manages a given storage and compresses the trajectories with different error tolerances subject to their ages. In the experiments, we conduct comprehensive evaluations for the BQS algorithm family and the ABQS framework. Using empirical GPS traces from flying foxes and cars, and synthetic data from simulation, we demonstrate the effectiveness of the standalone BQS algorithms in significantly reducing the time and space complexity of trajectory compression, while greatly improving the compression rates of the state-of-the-art algorithms (up to 45%). We also show that the operational time of the target resource-constrained hardware platform can be prolonged by up to 41%. We then verify that with ABQS, given data volumes that are far greater than storage space, ABQS is able to achieve 15 to 400 times smaller errors than the baselines. We also show that the algorithm is robust to extreme trajectory shapes.Comment: arXiv admin note: substantial text overlap with arXiv:1412.032

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    Multi-scale window specification over streaming trajectories

    Get PDF
    Enormous amounts of positional information are collected by monitoring applications in domains such as fleet management cargo transport wildlife protection etc. With the advent of modern location-based services processing such data mostly focuses on providing real-time response to a variety of user requests in continuous and scalable fashion. An important class of such queries concerns evolving trajectories that continuously trace the streaming locations of moving objects like GPS-equipped vehicles commodities with RFID\u27s people with smartphones etc. In this work we propose an advanced windowing operator that enables online incremental examination of recent motion paths at multiple resolutions for numerous point entities. When applied against incoming positions this window can abstract trajectories at coarser representations towards the past while retaining progressively finer features closer to the present. We explain the semantics of such multi-scale sliding windows through parameterized functions that reflect the sequential nature of trajectories and can effectively capture their spatiotemporal properties. Such window specification goes beyond its usual role for non-blocking processing of multiple concurrent queries. Actually it can offer concrete subsequences from each trajectory thus preserving continuity in time and contiguity in space along the respective segments. Further we suggest language extensions in order to express characteristic spatiotemporal queries using windows. Finally we discuss algorithms for nested maintenance of multi-scale windows and evaluate their efficiency against streaming positional data offering empirical evidence of their benefits to online trajectory processing

    When things matter: A survey on data-centric Internet of Things

    Get PDF
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, but several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy and continuous. This paper reviews the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    On-the-fly mobility event detection over aircraft trajectories

    Get PDF
    We present an application framework that consumes streaming positions from a large fleet of flying aircrafts monitored in real time over a wide geographical area. Tailored for aviation surveillance, this online processing scheme only retains locations conveying salient mobility events along each flight, and annotates them as stop, change of speed, heading or altitude, etc. Such evolving trajectory synopses must keep in pace with the incoming raw streams so as to get incrementally annotated with minimal loss in accuracy. We also develop one-pass heuristics to eliminate inherent noise and provide reliable trajectory representations. Our prototype implementation on top of Apache Flink and Kafka has been tested against various real and synthetic datasets offering concrete evidence of its timeliness, scalability, and compression efficiency, with tolerable concessions to the quality of resulting trajectory approximations. K. Patroumpas, N. Pelekis, and Y. Theodoridis: "On-the-fly Mobility Event Detection over Aircraft Trajectories". In proceeding of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2018), November 6 - 9, 2018 Seattle, Washington, USA Document type: Conference objec

    PolyFit: Polynomial-based Indexing Approach for Fast Approximate Range Aggregate Queries

    Get PDF
    Range aggregate queries find frequent application in data analytics. In some use cases, approximate results are preferred over accurate results if they can be computed rapidly and satisfy approximation guarantees. Inspired by a recent indexing approach, we provide means of representing a discrete point data set by continuous functions that can then serve as compact index structures. More specifically, we develop a polynomial-based indexing approach, called PolyFit, for processing approximate range aggregate queries. PolyFit is capable of supporting multiple types of range aggregate queries, including COUNT, SUM, MIN and MAX aggregates, with guaranteed absolute and relative error bounds. Experiment results show that PolyFit is faster and more accurate and compact than existing learned index structures.Comment: 13 page

    A Survey of Model-based Sensor Data Acquisition and Management

    Get PDF
    In recent years, due to the proliferation of sensor networks, there has been a genuine need of researching techniques for sensor data acquisition and management. To this end, a large number of techniques have emerged that advocate model-based sensor data acquisition and management. These techniques use mathematical models for performing various, day-to-day tasks involved in managing sensor data. In this chapter, we survey the state-of-the-art techniques for model-based sensor data acquisition and management. We start by discussing the techniques for acquiring sensor data. We, then, discuss the application of models in sensor data cleaning; followed by a discussion on model-based methods for querying sensor data. Lastly, we survey model-based methods proposed for data compression and synopsis generation

    Doctor of Philosophy

    Get PDF
    dissertationWe are living in an age where data are being generated faster than anyone has previously imagined across a broad application domain, including customer studies, social media, sensor networks, and the sciences, among many others. In some cases, data are generated in massive quantities as terabytes or petabytes. There have been numerous emerging challenges when dealing with massive data, including: (1) the explosion in size of data; (2) data have increasingly more complex structures and rich semantics, such as representing temporal data as a piecewise linear representation; (3) uncertain data are becoming a common occurrence for numerous applications, e.g., scientific measurements or observations such as meteorological measurements; (4) and data are becoming increasingly distributed, e.g., distributed data collected and integrated from distributed locations as well as data stored in a distributed file system within a cluster. Due to the massive nature of modern data, it is oftentimes infeasible for computers to efficiently manage and query them exactly. An attractive alternative is to use data summarization techniques to construct data summaries, where even efficiently constructing data summaries is a challenging task given the enormous size of data. The data summaries we focus on in this thesis include the histogram and ranking operator. Both data summaries enable us to summarize a massive dataset to a more succinct representation which can then be used to make queries orders of magnitude more efficient while still allowing approximation guarantees on query answers. Our study has focused on the critical task of designing efficient algorithms to summarize, query, and manage massive data

    Statistical Models for Querying and Managing Time-Series Data

    Get PDF
    In recent years we are experiencing a dramatic increase in the amount of available time-series data. Primary sources of time-series data are sensor networks, medical monitoring, financial applications, news feeds and social networking applications. Availability of large amount of time-series data calls for scalable data management techniques that enable efficient querying and analysis of such data in real-time and archival settings. Often the time-series data generated from sensors (environmental, RFID, GPS, etc.), are imprecise and uncertain in nature. Thus, it is necessary to characterize this uncertainty for producing clean answers. In this thesis we propose methods that address these important issues pertaining to time-series data. Particularly, this thesis is centered around the following three topics: Computing Statistical Measures on Large Time-Series Datasets. Computing statistical measures for large databases of time series is a fundamental primitive for querying and mining time-series data [31, 81, 97, 111, 132, 137]. This primitive is gaining importance with the increasing number and rapid growth of time-series databases. In Chapter 3, we introduce the Affinity framework for efficient computation of statistical measures by exploiting the concept of affine relationships [113, 114]. Affine relationships can be used to infer a large number of statistical measures for time series, from other related time series, instead of computing them directly; thus, reducing the overall computational cost significantly. Moreover, the Affinity framework proposes an unified approach for computing several statistical measures at once. Creating Probabilistic Databases from Imprecise Data. A large amount of time-series data produced in the real-world has an inherent element of uncertainty, arising due to the various sources of imprecision affecting its sources (like, sensor data, GPS trajectories, environmental monitoring data, etc.). The primary sources of imprecision in such data are: imprecise sensors, limited communication bandwidth, sensor failures, etc. Recently there has been an exponential rise in the number of such imprecise sensors, which has led to an explosion of imprecise data. Standard database techniques cannot be used to provide clean and consistent answers in such scenarios. Therefore, probabilistic databases that factor-in the inherent uncertainty and produce clean answers are required. An important assumption i while using probabilistic databases is that each data point has a probability distribution associated with it. This is not true in practice — the distributions are absent. As a solution to this fundamental limitation, in Chapter 4 we propose methods for inferring such probability distributions and using them for efficiently creating probabilistic databases [116]. Managing Participatory Sensing Data. Community-driven participatory sensing is a rapidly evolving paradigm in mobile geo-sensor networks. Here, sensors of various sorts (e.g., multi-sensor units monitoring air quality, cell phones, thermal watches, thermometers in vehicles, etc.) are carried by the community (public vehicles, private vehicles, or individuals) during their daily activities, collecting various types of data about their surrounding. Data generated by these devices is in large quantity, and geographically and temporally skewed. Therefore, it is important that systems designed for managing such data should be aware of these unique data characteristics. In Chapter 5, we propose the ConDense (Community-driven Sensing of the Environment) framework for managing and querying community-sensed data [5, 19, 115]. ConDense exploits spatial smoothness of environmental parameters (like, ambient pollution [5] or radiation [2]) to construct statistical models of the data. Since the number of constructed models is significantly smaller than the original data, we show that using our approach leads to dramatic increase in query processing efficiency [19, 115] and significantly reduces memory usage
    corecore