1,961,345 research outputs found
Benchmarking Distributed Stream Data Processing Systems
The need for scalable and efficient stream analysis has led to the
development of many open-source streaming data processing systems (SDPSs) with
highly diverging capabilities and performance characteristics. While first
initiatives try to compare the systems for simple workloads, there is a clear
gap of detailed analyses of the systems' performance characteristics. In this
paper, we propose a framework for benchmarking distributed stream processing
engines. We use our suite to evaluate the performance of three widely used
SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our
evaluation focuses in particular on measuring the throughput and latency of
windowed operations, which are the basic type of operations in stream
analytics. For this benchmark, we design workloads based on real-life,
industrial use-cases inspired by the online gaming industry. The contribution
of our work is threefold. First, we give a definition of latency and throughput
for stateful operators. Second, we carefully separate the system under test and
driver, in order to correctly represent the open world model of typical stream
processing deployments and can, therefore, measure system performance under
realistic conditions. Third, we build the first benchmarking framework to
define and test the sustainable performance of streaming systems.
Our detailed evaluation highlights the individual characteristics and
use-cases of each system.Comment: Published at ICDE 201
Using Java for distributed computing in the Gaia satellite data processing
In recent years Java has matured to a stable easy-to-use language with the
flexibility of an interpreter (for reflection etc.) but the performance and
type checking of a compiled language. When we started using Java for
astronomical applications around 1999 they were the first of their kind in
astronomy. Now a great deal of astronomy software is written in Java as are
many business applications.
We discuss the current environment and trends concerning the language and
present an actual example of scientific use of Java for high-performance
distributed computing: ESA's mission Gaia. The Gaia scanning satellite will
perform a galactic census of about 1000 million objects in our galaxy. The Gaia
community has chosen to write its processing software in Java. We explore the
manifold reasons for choosing Java for this large science collaboration.
Gaia processing is numerically complex but highly distributable, some parts
being embarrassingly parallel. We describe the Gaia processing architecture and
its realisation in Java. We delve into the astrometric solution which is the
most advanced and most complex part of the processing. The Gaia simulator is
also written in Java and is the most mature code in the system. This has been
successfully running since about 2005 on the supercomputer "Marenostrum" in
Barcelona. We relate experiences of using Java on a large shared machine.
Finally we discuss Java, including some of its problems, for scientific
computing.Comment: Experimental Astronomy, August 201
Fragmentation of confidential objects for data processing security in distributed systems
This paper discusses how object orientation in application design enables confidentiality aspects to be handled more easily than in conventional approaches. The idea, based on object fragmentation at design time, is to reduce processing in confidential objects; the more non confidential objects can be produced at design-time, the more application objects can be processed on untrusted shared computers. Still confidential objects must be processed on non shared trusted workstations. Rules and limits of object fragmentation are discussed together with some criteria evaluating trade-offs between fragmentation and performance
Query processing in distributed data bases
Bibliography: p. 22."July, 1981."Office of Naval Research contract ONR/N 00014-77-C-0532 (NR 041-519)by Victor O.K. Li
- …
