445 research outputs found

    Discriminative fine-grained mixing for adaptive compression of data streams

    Get PDF
    Cataloged from PDF version of article.This paper introduces an adaptive compression algorithm for transfer of data streams across operators in stream processing systems. The algorithm is adaptive in the sense that it can adjust the amount of compression applied based on the bandwidth, Cpu, and workload availability. It is discriminative in the sense that it can judiciously apply partial compression by selecting a subset of attributes that can provide good reduction in the used bandwidth at a low cost. The algorithm relies on the significant differences that exist among stream attributes with respect to their relative sizes, compression ratios, compression costs, and their amenability to application of custom compressors. As part of this study, we present a modeling of uniform and discriminative mixing, and provide various greedy algorithms and associated metrics to locate an effective setting when model parameters are available at run-time. Furthermore, we provide online and adaptive algorithms for real-world systems in which system parameters that can be measured at run-time are limited. We present a detailed experimental study that illustrates the superiority of discriminative mixing over uniform mixing

    Auto-tuning similarity search algorithms on multi-core architectures

    Get PDF
    Cataloged from PDF version of article.In recent times, large high-dimensional datasets have become ubiquitous. Video and image repositories, financial, and sensor data are just a few examples of such datasets in practice. Many applications that use such datasets require the retrieval of data items similar to a given query item, or the nearest neighbors (NN or k-NN) of a given item. Another common query is the retrieval of multiple sets of nearest neighbors, i.e., multi k-NN, for different query items on the same data. With commodity multi-core CPUs becoming more and more widespread at lower costs, developing parallel algorithms for these search problems has become increasingly important. While the core nearest neighbor search problem is relatively easy to parallelize, it is challenging to tune it for optimality. This is due to the fact that the various performance-specific algorithmic parameters, or “tuning knobs”, are inter-related and also depend on the data and query workloads. In this paper, we present (1) a detailed study of the various tuning knobs and their contributions on increasing the query throughput for parallelized versions of the two most common classes of high-dimensional multi-NN search algorithms: linear scan and tree traversal, and (2) an offline auto-tuner for setting these knobs by iteratively measuring actual query execution times for a given workload and dataset. We show experimentally that our auto-tuner reaches near-optimal performance and significantly outperforms un-tuned versions of parallel multi-NN algorithms for real video repository data on a variety of multi-core platforms. © Springer Science+Business Media New York 201

    Partitioning functions for steteful data parallelism in stream processing

    Get PDF
    Cataloged from PDF version of article.In this paper we study partitioning functions for stream processing systems that employ stateful data parallelism to improve application throughput. In particular, we develop partitioning functions that are effective under workloads where the domain of the partitioning key is large and its value distribution is skewed. We define various desirable properties for partitioning functions, ranging from balance properties such as memory, processing, and communication balance, structural properties such as compactness and fast lookup, and adaptation properties such as fast computation and minimal migration. We introduce a partitioning function structure that is compact and develop several associated heuristic construction techniques that exhibit good balance and low migration cost under skewed workloads. We provide experimental results that compare our partitioning functions to more traditional approaches such as uniform and consistent hashing, under different workload and application characteristics, and show superior performance

    Generic windowing support for extensible stream processing systems

    Get PDF
    Cataloged from PDF version of article.Stream processing applications process high volume, continuous feeds from live data sources, employ data-in-motion analytics to analyze these feeds, and produce near real-time insights with low latency. One of the fundamental characteristics of such applications is the on-the-fly nature of the computation, which does not require access to disk resident data. Stream processing applications store the most recent history of streams in memory and use it to perform the necessary modeling and analysis tasks. This recent history is often managed using windows. All data stream management systems provide some form of windowing functionality. Windowing makes it possible to implement streaming versions of the traditionally blocking relational operators, such as streaming aggregations, joins, and sorts, as well as any other analytic operator that requires keeping the most recent tuples as state, such as time series analysis operators and signal processing operators. In this paper, we provide a categorization of different window types and policies employed in stream processing applications and give detailed operational semantics for various window configurations. We describe an extensibility mechanism that makes it possible to integrate windowing support into user-defined operators, enabling consistent syntax and semantics across system-provided and third-party toolkits of streaming operators. We describe the design and implementation of a runtime windowing library that significantly simplifies the construction of window-based operators by decoupling the handling of window policies and operator logic from each other. We present our experience using the windowing library to implement a relational operators toolkit and compare the efficacy of the solution to an earlier implementation that did not employ a common windowing library. Copyright (c) 2013 John Wiley & Sons, Ltd

    Absence of Magnetic Fluctuations in the Ferromagnetic/Topological Heterostructure EuS/Bi2_{2}Se3_{3}

    Full text link
    Heterostructures of topological insulators and ferromagnets offer new opportunities in spintronics and a route to novel anomalous Hall states. In one such structure, EuS/Bi2_{2}Se3_{3} a dramatic enhancement of the Curie temperature was recently observed. We performed Raman spectroscopy on a similar set of thin films to investigate the magnetic and lattice excitations. Interfacial strain was monitored through its effects on the Bi2_{2}Se3_{3} phonon modes while the magnetic system was probed through the EuS Raman mode. Despite its appearance in bare EuS, the heterostructures lack the corresponding EuS Raman signal. Through numerical calculations we rule out the possibility of Fabry-Perot interference suppressing the mode. We attribute the absence of a magnetic signal in EuS to a large charge transfer with the Bi2_{2}Se3_{3}. This could provide an additional pathway for manipulating the magnetic, optical, or electronic response of topological heterostructures.Comment: 6 pages, 3 figure

    Elastic scaling for data stream processing

    Get PDF
    Cataloged from PDF version of article.This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application's data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution

    A catalog of stream processing optimizations

    Get PDF
    Cataloged from PDF version of article.Various research communities have independently arrived at stream processing as a programming model for efficient and parallel computing. These communities include digital signal processing, databases, operating systems, and complex event processing. Since each community faces applications with challenging performance requirements, each of them has developed some of the same optimizations, but often with conflicting terminology and unstated assumptions. This article presents a survey of optimizations for stream processing. It is aimed both at users who need to understand and guide the system's optimizer and at implementers who need to make engineering tradeoffs. To consolidate terminology, this article is organized as a catalog, in a style similar to catalogs of design patterns or refactorings. To make assumptions explicit and help understand tradeoffs, each optimization is presented with its safety constraints (when does it preserve correctness?) and a profitability experiment (when does it improve performance?). We hope that this survey will help future streaming system builders to stand on the shoulders of giants from not just their own community. © 2014 ACM

    4D visualization of embryonic, structural crystallization by single-pulse microscopy

    Get PDF
    In many physical and biological systems the transition from an amorphous to ordered native structure involves complex energy landscapes, and understanding such transformations requires not only their thermodynamics but also the structural dynamics during the process. Here, we extend our 4D visualization method with electron imaging to include the study of irreversible processes with a single pulse in the same ultrafast electron microscope (UEM) as used before in the single-electron mode for the study of reversible processes. With this augmentation, we report on the transformation of amorphous to crystalline structure with silicon as an example. A single heating pulse was used to initiate crystallization from the amorphous phase while a single packet of electrons imaged selectively in space the transformation as the structure continuously changes with time. From the evolution of crystallinity in real time and the changes in morphology, for nanosecond and femtosecond pulse heating, we describe two types of processes, one that occurs at early time and involves a nondiffusive motion and another that takes place on a longer time scale. Similar mechanisms of two distinct time scales may perhaps be important in biomolecular folding

    RailwayDB: adaptive storage of interaction graphs

    Get PDF
    We are living in an ever more connected world, where data recording the interactions between people, software systems, and the physical world is becoming increasingly prevalent. These data often take the form of a temporally evolving graph, where entities are the vertices and the interactions between them are the edges. We call such graphs interaction graphs. Various domains, including telecommunications, transportation, and social media, depend on analytics performed on interaction graphs. The ability to efficiently support historical analysis over interaction graphs requires effective solutions for the problem of data layout on disk. This paper presents an adaptive disk layout called the railway layout for optimizing disk block storage for interaction graphs. The key idea is to divide blocks into one or more sub-blocks. Each sub-block contains the entire graph structure, but only a subset of the attributes. This improves query I/O, at the cost of increased storage overhead. We introduce optimal integer linear program (ILP) formulations for partitioning disk blocks into sub-blocks with overlapping and nonoverlapping attributes. Additionally, we present greedy heuristics that can scale better compared to the ILP alternatives, yet achieve close to optimal query I/O. We provide an implementation of the railway layout as part of RailwayDB—an open-source graph database we have developed. To demonstrate the benefits of the railway layout, we provide an extensive experimental evaluation, including model-based as well as empirical results comparing our approach to baseline alternatives. © 2015, Springer-Verlag Berlin Heidelberg

    Autopipelining for data stream processing

    Get PDF
    Stream processing applications use online analytics to ingest high-rate data sources, process them on-the-fly, and generate live results in a timely manner. The data flow graph representation of these applications facilitates the specification of stream computing tasks with ease, and also lends itself to possible runtime exploitation of parallelization on multicore processors. While the data flow graphs naturally contain a rich set of parallelization opportunities, exploiting them is challenging due to the combinatorial number of possible configurations. Furthermore, the best configuration is dynamic in nature; it can differ across multiple runs of the application, and even during different phases of the same run. In this paper, we propose an autopipelining solution that can take advantage of multicore processors to improve throughput of streaming applications, in an effective and transparent way. The solution is effective in the sense that it provides good utilization of resources by dynamically finding and exploiting sources of pipeline parallelism in streaming applications. It is transparent in the sense that it does not require any hints from the application developers. As a part of our solution, we describe a light-weight runtime profiling scheme to learn resource usage of operators comprising the application, an optimization algorithm to locate best places in the data flow graph to explore additional parallelism, and an adaptive control scheme to find the right level of parallelism. We have implemented our solution in an industrial-strength stream processing system. Our experimental evaluation based on microbenchmarks, synthetic workloads, as well as real-world applications confirms that our design is effective in optimizing the throughput of stream processing applications without requiring any changes to the application code. © 1990-2012 IEEE