7 research outputs found

    Adaptive Stream-based Shifting Bottleneck Detection in IoT-based Computing Architectures

    Get PDF
    Cloud computing is revolutionizing the backbone of data analysis applications, including industrial ones. One of its main pillars is the separation of the logic with which data is accessed (e.g., to study the efficiency of a manufacturing system) from the actual hardware (e.g., server) that maintains and analyses the data. Large distributed cyber-physical systems enabled by, among other technologies, the Internet of Things (IoT), made nonetheless clear that \u27what to do\u27 with the data and \u27where to do it\u27 are not disjoint problems; i.e., cloud computing on its own is not enough. Fog and edge computing have emerged as complementary options, to distribute the analysis, helping with challenges by means of close-to-the-source data analysis.We show for a key problem for industrial processes, that of shifting bottleneck detection, how to take advantage of such multi-tier computing architectures, to perform continuous and configurable analysis of data from Manufacturing Execution Systems. We propose a processing framework, STRATUM, and an algorithm, AMBLE, for continuous, data stream processing. STRATUM seamlessly distributes and parallelizes the processing across the tiers and AMBLE guarantees consistent analysis in spite of timing fluctuations, which are commonly introduced due to e.g. the communication system; it also achieves efficiency through appropriate data structures for in-memory processing. The experimental study on a real-world dataset, taken from a production line over two years and including 8.5 million entries, shows the benefits of the proposed solution in enabling configurable and efficient analysis

    Efficient Data Streaming Multiway Aggregation through Concurrent Algorithmic Designs and New Abstract Data Types

    No full text
    Data streaming relies on continuous queries to process unbounded streams of data in a real-time fashion. It is commonly demanding in computation capacity, given that the relevant applications involve very large volumes of data. Data structures act as articulation points and maintain the state of data streaming operators, potentially supporting high parallelism and balancing the work among them. Prompted by this fact, in this work we study and analyze parallelization needs of these articulation points, focusing on the problem of streaming multiway aggregation, where large data volumes are received from multiple input streams. The analysis of the parallelization needs, as well as of the use and limitations of existing aggregate designs and their data structures, leads us to identify needs for appropriate shared objects that can achieve low-latency and high-throughput multiway aggregation. We present the requirements of such objects as abstract data types and we provide efficient lock-free linearizable algorithmic implementations of them, along with new multiway aggregate algorithmic designs that leverage them, supporting both deterministic order-sensitive and order-insensitive aggregate functions. Furthermore, we point out future directions that open through these contributions. The article includes an extensive experimental study, based on a variety of continuous aggregation queries on two large datasets extracted from SoundCloud, a music social network, and from a Smart Grid network. In all the experiments, the proposed data structures and the enhanced aggregate operators improved the processing performance significantly, up to one order of magnitude, in terms of both throughput and latency, over the commonly used techniques based on queues

    Efficient Data Streaming Multiway Aggregation through Concurrent Algorithmic Designs and New Abstract Data Types

    No full text

    GeneaLog: Fine-Grained Data Streaming Provenance at the Edge

    Get PDF
    Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized.The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead

    Distributed and Communication-Efficient Continuous Data Processing in Vehicular Cyber-Physical Systems

    Get PDF
    Processing the data produced by modern connected vehicles is of increasing interest for vehicle manufacturers to gain knowledge and develop novel functions and applications for the future of mobility.Connected vehicles form Vehicular Cyber-Physical Systems (VCPSs) that continuously sense increasingly large data volumes from high-bandwidth sensors such as LiDARs (an array of laser-based distance sensors that create a 3D map of the surroundings).The straightforward attempt of gathering all raw data from a VCPS to a central location for analysis often fails due to limits imposed by the infrastructure on the communication and storage capacities. In this Licentiate thesis, I present the results from my research that investigates techniques aiming at reducing the data volumes that need to be transmitted from vehicles through online compression and adaptive selection of participating vehicles. As explained in this work, the key to reducing the communication volume is in pushing parts of the necessary processing onto the vehicles\u27 on-board computers, thereby favorably leveraging the available distributed processing infrastructure in a VCPS.The findings highlight that existing analysis workflows can be sped up significantly while reducing their data volume footprint and incurring only modest accuracy decreases. At the same time, the adaptive selection of vehicles for analyses proves to provide a sufficiently large subset of vehicles that have compliant data for further analyses, while balancing the time needed for selection and the induced computational load

    On Design and Applications of Practical Concurrent Data Structures

    Get PDF
    The proliferation of multicore processors is having an enormous impact on software design and development. In order to exploit parallelism available in multicores, there is a need to design and implement abstractions that programmers can use for general purpose applications development. A common abstraction for coordinated access to memory is a concurrent data structure. Concurrent data structures are challenging to design and implement as they are required to be correct, scalable, and practical under various application constraints. In this thesis, we contribute to the design of efficient concurrent data structures, propose new design techniques and improvements to existing implementations. Additionally, we explore the utilization of concurrent data structures in demanding application contexts such as data stream processing.In the first part of the thesis, we focus on data structures that are difficult to parallelize due to inherent sequential bottlenecks. We present a lock-free vector design that efficiently addresses synchronization bottlenecks by utilizing the combining technique. Typical combining techniques are blocking. Our design introduces combining without sacrificing non-blocking progress guarantees. We extend the vector to present a concurrent lock-free unbounded binary heap that implements a priority queue with mutable priorities.In the second part of the thesis, we shift our focus to concurrent search data structures. In order to offer strong progress guarantee, typical implementations of non-blocking search data structures employ a "helping" mechanism. However, helping may result in performance degradation. We propose help-optimality, which expresses optimization in amortized step complexity of concurrent operations. To describe the concept, we revisit the lock-free designs of a linked-list and a binary search tree and present improved algorithms. We design the algorithms without using any language/platform specific constructs; we do not use bit-stealing or runtime type introspection of objects. Thus, our algorithms are portable. We further delve into multi-dimensional data and similarity search. We present the first lock-free multi-dimensional data structure and linearizable nearest neighbor search algorithm. Our algorithm for nearest neighbor search is generic and can be adapted to other data structures.In the last part of the thesis, we explore the utilization of concurrent data structures for deterministic stream processing. We propose solutions to two challenges prevalent in data stream processing: (1) efficient processing on cloud as well as edge devices and (2) deterministic data-parallel processing at high-throughput and low-latency. As a first step, we present a methodology for customization of streaming aggregation on low-power multicore embedded platforms. Then we introduce Viper, a communication module that can be integrated into stream processing engines for the coordination of threads analyzing data in parallel
    corecore