Search CORE

1,961,345 research outputs found

Benchmarking Distributed Stream Data Processing Systems

Author: Heiskanen Henri
Karimov Jeyhun
Katsifodimos Asterios
Markl Volker
Rabl Tilmann
Samarev Roman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/06/2019
Field of study

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.Comment: Published at ICDE 201

arXiv.org e-Print Archive

Crossref

Using Java for distributed computing in the Gaia satellite data processing

Author: Hernandez Jose
Hoar John
Lammers Uwe
Luri Xavier
O'Mullane William
Parsons Paul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/07/2011
Field of study

In recent years Java has matured to a stable easy-to-use language with the flexibility of an interpreter (for reflection etc.) but the performance and type checking of a compiled language. When we started using Java for astronomical applications around 1999 they were the first of their kind in astronomy. Now a great deal of astronomy software is written in Java as are many business applications. We discuss the current environment and trends concerning the language and present an actual example of scientific use of Java for high-performance distributed computing: ESA's mission Gaia. The Gaia scanning satellite will perform a galactic census of about 1000 million objects in our galaxy. The Gaia community has chosen to write its processing software in Java. We explore the manifold reasons for choosing Java for this large science collaboration. Gaia processing is numerically complex but highly distributable, some parts being embarrassingly parallel. We describe the Gaia processing architecture and its realisation in Java. We delve into the astrometric solution which is the most advanced and most complex part of the processing. The Gaia simulator is also written in Java and is the most mature code in the system. This has been successfully running since about 2005 on the supercomputer "Marenostrum" in Barcelona. We relate experiences of using Java on a large shared machine. Finally we discuss Java, including some of its problems, for scientific computing.Comment: Experimental Astronomy, August 201

arXiv.org e-Print Archive

Crossref

Diposit Digital de la Universitat de Barcelona

Fragmentation of confidential objects for data processing security in distributed systems

Author: Fabre Jean-Charles
Pérennou Tanguy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

This paper discusses how object orientation in application design enables confidentiality aspects to be handled more easily than in conventional approaches. The idea, based on object fragmentation at design time, is to reduce processing in confidential objects; the more non confidential objects can be produced at design-time, the more application objects can be processed on untrusted shared computers. Still confidential objects must be processed on non shared trusted workstations. Rules and limits of object fragmentation are discussed together with some criteria evaluating trade-offs between fragmentation and performance

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse

Query processing in distributed data bases

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1981
Field of study

Bibliography: p. 22."July, 1981."Office of Naval Research contract ONR/N 00014-77-C-0532 (NR 041-519)by Victor O.K. Li

DSpace@MIT