Search CORE

340,395 research outputs found

Benchmarking Distributed Stream Data Processing Systems

Author: Heiskanen Henri
Karimov Jeyhun
Katsifodimos Asterios
Markl Volker
Rabl Tilmann
Samarev Roman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/06/2019
Field of study

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.Comment: Published at ICDE 201

arXiv.org e-Print Archive

Crossref

MIMO throughput effectiveness for basic MIMO OTA compliance testing

Author: Belda Sanchiz Álvaro
Marin Soler Adoración
Martínez González Antonio Manuel
Ypiña García Guillermo
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

During the March 2011 meeting of the CTIA MIMO OTA Subgroup (MOSG), the members agreed that the subgroup should first determine “what” aspects of a MIMO-capable device require evaluation; then the group should determine “how” to go about making these measurements. In subsequent meetings of MOSG, new yet-unnamed figures of merit were asked for in order to provide a solution to the carriers' requirements for LTE MIMO OTA evaluation. Furthermore, the December 2011 3GPP RAN4 status report on LTE MIMO OTA listed the evaluation of the use of statistical performance analysis in order to minimize test time and help ensure accurate performance assessment as an open issue. This contribution addresses these petitions by providing four new figures of merit, which could serve the purpose of evaluating the operators' top priorities for MIMO OTA compliance testing. The new figures of merit are MIMO Throughput Effectiveness (MTE), MIMO Device Throughput Effectiveness (MDTE), MIMO Throughput Gain (MTG), and MIMO Device Throughput Gain (MDTG). In this paper, MTE is evaluated using the recently available LTE MIMO OTA RR data from 3GPP

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

Repositorio Digital de la Universidad Politécnica de Cartagena

Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

Author: Cheng Yu
Rusu Florin
Publication venue
Publication date: 01/01/2013
Field of study

Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Recommended from our members

Accuracy and interpretability trade-offs in machine learning applied to safer gambling

Author: Dragicevic S.
Garcez A.
Percy C.
Sarkar S.
Slabaugh G. G.
Weyde T.
Publication venue: CEUR Workshop Proceedings
Publication date: 26/12/2016
Field of study

Responsible gambling is an area of research and industry which seeks to understand the pathways to harm from gambling and implement programmes to reduce or prevent harm that gambling might cause. There is a growing body of research that has used gambling behavioural data to model and predict harmful gambling, and the industry is showing increasing interest in technologies that can help gambling operators to better predict harm and prevent it through appropriate interventions. However, industry surveys and feedback clearly indicate that in order to enable wider adoption of such data-driven methods, industry and policy makers require a greater understanding of how machine learning methods make these predictions. In this paper, we make use of the TREPAN algorithm for extracting decision trees from Neural Networks and Random Forests. We present the first comparative evaluation of predictive performance and tree properties for extracted trees, which is also the first comparative evaluation of knowledge extraction for safer gambling. Results indicate that TREPAN extracts better performing trees than direct learning of decision trees from the data. Overall, trees extracted with TREPAN from different models offer a good compromise between prediction accuracy and interpretability. TREPAN can produce decision trees with extended tests rules of different forms, so that interpretability depends on multiple factors. We present detailed results and a discussion of the trade-offs with regard to performance and interpretability and use in the gambling industry

City Research Online

Building Efficient Query Engines in a High-Level Language

Author: Klonatos Yannis
Koch Christoph
Shaikhha Amir
Publication venue
Publication date: 16/12/2016
Field of study

Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

A GA-based technique for the scheduling of storage tanks

Author: Aldridge C.J.
Burt G.M.
Dahal Keshav P.
McDonald J.R.
Publication venue
Publication date: 01/01/1999
Field of study

YesThis paper proposes the application of a genetic algorithm based methodology for the scheduling of storage tanks. The proposed approach is an integration of GA and heuristic rule-based techniques, which decomposes the complex mixed integer optimisation problem into integer and real number subproblems. The GA string considers the integer problem, and the heuristic approach solves the real number problems within the GA framework. The algorithm is demonstrated for a test problem related to a water treatment facility at a port, and has been found to give a significantly better schedule than those generated using a heuristic-based approach

Bradford Scholars

Target oriented relational model finding

Author: A. Milicevic
E. Torlak
K. Czarnecki
R. Asín
R. Nokhbeh Zaeem
R. Reiter
R. Straeten Van Der
R. Tarjan
Z. Fu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Lecture Notes in Computer Science 8411, 2014Model finders are becoming useful in many software engineering problems. Kodkod is one of the most popular, due to its support for relational logic (a combination of first order logic with relational algebra operators and transitive closure), allowing a simpler specification of constraints, and support for partial instances, allowing the specification of a priori (exact, but potentially partial) knowledge about a problem's solution. However, in some software engineering problems, such as model repair or bidirectional model transformation, knowledge about the solution is not exact, but instead there is a known target that the solution should approximate. In this paper we extend Kodkod's partial instances to allow the specification of such targets, and show how its model finding procedure can be adapted to support them (using both PMax-SAT solvers or SAT solvers with cardinality constraints). Two case studies are also presented, including a careful performance evaluation to assess the effectiveness of the proposed extension.(undefined

Universidade do Minho: RepositoriUM

Crossref

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

Author: He Bingsheng
He Jiong
Zhang Shuhao
Zhou Amelie Chi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/04/2019
Field of study

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS