6 research outputs found

    TinTiN: Travelling in time (if necessary) to deal with out-of-order data in streaming aggregation

    Get PDF
    Cyber-Physical Systems (CPS) rely on data stream processing for high-throughput, low-latency analysis with correctness and accuracy guarantees (building on deterministic execution) for monitoring, safety or security applications.The trade-offs in processing performance and results\u27 accuracy are nonetheless application-dependent. While some applications need strict deterministic execution, others can value fast (but possibly approximated) answers.Despite the existing literature on how to relax and trade strict determinism for efficiency or deadlines, we lack a formal characterization of levels of determinism, needed by industries to assess whether or not such trade-offs are acceptable.To bridge the gap, we introduce the notion of D-bounded eventual determinism, where D is the maximum out-of-order delay of the input data.We design and implement TinTiN, a streaming middleware that can be used in combination with user-defined streaming applications, to provably enforce D-bounded eventual determinism.We evaluate TinTiN with a real-world streaming application for Advanced Metering Infrastructure (AMI) monitoring, showing it provides an order of magnitude improvement in processing performance, while minimizing delays in output generation, compared to a state-of-the-art strictly deterministic solution that waits for time proportional to D, for each input tuple, before generating output that depends on it

    Time- and Computation-Efficient Data Localization at Vehicular Networks\u27 Edge

    Get PDF
    As Vehicular Networks rely increasingly on sensed data to enhance functionality and safety, efficient and distributed data analysis is needed to effectively leverage new technologies in real-world applications. Considering the tens of GBs per hour sensed by modern connected vehicles, traditional analysis, based on global data accumulation, can rapidly exhaust the capacity of the underlying network, becoming increasingly costly, slow, or even infeasible. Employing the edge processing paradigm, which aims at alleviating this drawback by leveraging vehicles\u27 computational power, we are the first to study how to localize, efficiently and distributively, relevant data in a vehicular fleet for analysis applications. This is achieved by appropriate methods to spread requests across the fleet, while efficiently balancing the time needed to identify relevant vehicles, and the computational overhead induced on the Vehicular Network. We evaluate our techniques using two large sets of real-world data in a realistic environment where vehicles join or leave the fleet during the distributed data localization process. As we show, our algorithms are both efficient and configurable, outperforming the baseline algorithms by up to a 40 7 speedup while reducing computational overhead by up to 3 7 , while providing good estimates for the fraction of vehicles with relevant data and fairly spreading the workload over the fleet. All code as well as detailed instructions are available at https://github.com/dcs-chalmers/dataloc_vn

    Distributed and Communication-Efficient Continuous Data Processing in Vehicular Cyber-Physical Systems

    Get PDF
    Processing the data produced by modern connected vehicles is of increasing interest for vehicle manufacturers to gain knowledge and develop novel functions and applications for the future of mobility.Connected vehicles form Vehicular Cyber-Physical Systems (VCPSs) that continuously sense increasingly large data volumes from high-bandwidth sensors such as LiDARs (an array of laser-based distance sensors that create a 3D map of the surroundings).The straightforward attempt of gathering all raw data from a VCPS to a central location for analysis often fails due to limits imposed by the infrastructure on the communication and storage capacities. In this Licentiate thesis, I present the results from my research that investigates techniques aiming at reducing the data volumes that need to be transmitted from vehicles through online compression and adaptive selection of participating vehicles. As explained in this work, the key to reducing the communication volume is in pushing parts of the necessary processing onto the vehicles\u27 on-board computers, thereby favorably leveraging the available distributed processing infrastructure in a VCPS.The findings highlight that existing analysis workflows can be sped up significantly while reducing their data volume footprint and incurring only modest accuracy decreases. At the same time, the adaptive selection of vehicles for analyses proves to provide a sufficiently large subset of vehicles that have compliant data for further analyses, while balancing the time needed for selection and the induced computational load

    Data stream processing meets the Advanced Metering Infrastructure: possibilities, challenges and applications

    Get PDF
    Distribution of electricity is changing.Energy production is increasingly distributed, weather dependent and located in the distribution network, close to consumers.Energy consumption is increasing throughout society and the electrification of transportation is driving distribution networks closer to the limits.Operating the networks closer to their limits also increases the risk for faults.Continuous monitoring of the distribution network closest to the customers is needed in order to mitigate this risk.The Advanced Metering Infrastructure introduced smart meters throughout the distribution network.Data stream processing is a computing paradigm that offers low latency results from analysis on large volumes of the data.This thesis investigates the possibilities and challenges for continuous monitoring that are created when the Advanced Metering Infrastructure and data stream processing meet.The challenges that are addressed in the thesis are efficient processing of unordered (also called out-of-order) data and efficient usage of the computational resources present in the Advanced Metering Infrastructure.Contributions towards more efficient processing of out-of-order data are made with eChIDNA and TinTiN. Both are systems that utilize knowledge about smart meter data to directly produce results where possible and storing only data that is relevant for late data in order to produce updated results when such late data arrives. eChIDNA is integrated in the streaming query itself, while TinTiN is a streaming middleware that can be applied to streaming queries in order to make them resilient against out-of-order data.Eventual determinism is defined in order to formally investigate the deterministic properties of output produced by such systems.Contributions towards efficient usage of the computational resources of the Advanced Metering Infrastructure are made with the application LoCoVolt.LoCoVolt implements a monitoring algorithm that can run on equipment that is localized in the communication infrastructure of the Advanced Metering Infrastructure and can take advantage of the overlap between the communication and distribution networks.All contributions are evaluated on hardware that is available in current AMI systems, using large scale data obtained from a real production AMI

    On Design and Applications of Practical Concurrent Data Structures

    Get PDF
    The proliferation of multicore processors is having an enormous impact on software design and development. In order to exploit parallelism available in multicores, there is a need to design and implement abstractions that programmers can use for general purpose applications development. A common abstraction for coordinated access to memory is a concurrent data structure. Concurrent data structures are challenging to design and implement as they are required to be correct, scalable, and practical under various application constraints. In this thesis, we contribute to the design of efficient concurrent data structures, propose new design techniques and improvements to existing implementations. Additionally, we explore the utilization of concurrent data structures in demanding application contexts such as data stream processing.In the first part of the thesis, we focus on data structures that are difficult to parallelize due to inherent sequential bottlenecks. We present a lock-free vector design that efficiently addresses synchronization bottlenecks by utilizing the combining technique. Typical combining techniques are blocking. Our design introduces combining without sacrificing non-blocking progress guarantees. We extend the vector to present a concurrent lock-free unbounded binary heap that implements a priority queue with mutable priorities.In the second part of the thesis, we shift our focus to concurrent search data structures. In order to offer strong progress guarantee, typical implementations of non-blocking search data structures employ a "helping" mechanism. However, helping may result in performance degradation. We propose help-optimality, which expresses optimization in amortized step complexity of concurrent operations. To describe the concept, we revisit the lock-free designs of a linked-list and a binary search tree and present improved algorithms. We design the algorithms without using any language/platform specific constructs; we do not use bit-stealing or runtime type introspection of objects. Thus, our algorithms are portable. We further delve into multi-dimensional data and similarity search. We present the first lock-free multi-dimensional data structure and linearizable nearest neighbor search algorithm. Our algorithm for nearest neighbor search is generic and can be adapted to other data structures.In the last part of the thesis, we explore the utilization of concurrent data structures for deterministic stream processing. We propose solutions to two challenges prevalent in data stream processing: (1) efficient processing on cloud as well as edge devices and (2) deterministic data-parallel processing at high-throughput and low-latency. As a first step, we present a methodology for customization of streaming aggregation on low-power multicore embedded platforms. Then we introduce Viper, a communication module that can be integrated into stream processing engines for the coordination of threads analyzing data in parallel

    Understanding the data-processing challenges in Intelligent Vehicular Systems

    No full text
    Vehicular sensors able to perceive and measure the environment, ranging from in-vehicle sensors to speed cameras, are revolutionizing how technology can interact with our daily lives, enabling Intelligent Vehicular Systems (IVSs). These sensors generate large volumes of data which can reveal useful information for enhancing the sustainable development (through improved utilization of resources), as well as the safety and functionality of the system
    corecore