340,395 research outputs found

    Benchmarking Distributed Stream Data Processing Systems

    Full text link
    The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.Comment: Published at ICDE 201

    MIMO throughput effectiveness for basic MIMO OTA compliance testing

    Get PDF
    During the March 2011 meeting of the CTIA MIMO OTA Subgroup (MOSG), the members agreed that the subgroup should first determine “what” aspects of a MIMO-capable device require evaluation; then the group should determine “how” to go about making these measurements. In subsequent meetings of MOSG, new yet-unnamed figures of merit were asked for in order to provide a solution to the carriers' requirements for LTE MIMO OTA evaluation. Furthermore, the December 2011 3GPP RAN4 status report on LTE MIMO OTA listed the evaluation of the use of statistical performance analysis in order to minimize test time and help ensure accurate performance assessment as an open issue. This contribution addresses these petitions by providing four new figures of merit, which could serve the purpose of evaluating the operators' top priorities for MIMO OTA compliance testing. The new figures of merit are MIMO Throughput Effectiveness (MTE), MIMO Device Throughput Effectiveness (MDTE), MIMO Throughput Gain (MTG), and MIMO Device Throughput Gain (MDTG). In this paper, MTE is evaluated using the recently available LTE MIMO OTA RR data from 3GPP

    Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

    Full text link
    Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure

    Building Efficient Query Engines in a High-Level Language

    Get PDF
    Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

    Target oriented relational model finding

    Get PDF
    Lecture Notes in Computer Science 8411, 2014Model finders are becoming useful in many software engineering problems. Kodkod is one of the most popular, due to its support for relational logic (a combination of first order logic with relational algebra operators and transitive closure), allowing a simpler specification of constraints, and support for partial instances, allowing the specification of a priori (exact, but potentially partial) knowledge about a problem's solution. However, in some software engineering problems, such as model repair or bidirectional model transformation, knowledge about the solution is not exact, but instead there is a known target that the solution should approximate. In this paper we extend Kodkod's partial instances to allow the specification of such targets, and show how its model finding procedure can be adapted to support them (using both PMax-SAT solvers or SAT solvers with cardinality constraints). Two case studies are also presented, including a careful performance evaluation to assess the effectiveness of the proposed extension.(undefined

    BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

    Full text link
    We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1
    corecore