25,671 research outputs found
Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform
Advances in detectors and computational technologies provide new
opportunities for applied research and the fundamental sciences. Concurrently,
dramatic increases in the three Vs (Volume, Velocity, and Variety) of
experimental data and the scale of computational tasks produced the demand for
new real-time processing systems at experimental facilities. Recently, this
demand was addressed by the Spark-MPI approach connecting the Spark
data-intensive platform with the MPI high-performance framework. In contrast
with existing data management and analytics systems, Spark introduced a new
middleware based on resilient distributed datasets (RDDs), which decoupled
various data sources from high-level processing algorithms. The RDD middleware
significantly advanced the scope of data-intensive applications, spreading from
SQL queries to machine learning to graph processing. Spark-MPI further extended
the Spark ecosystem with the MPI applications using the Process Management
Interface. The paper explores this integrated platform within the context of
online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201
Data Provenance and Management in Radio Astronomy: A Stream Computing Approach
New approaches for data provenance and data management (DPDM) are required
for mega science projects like the Square Kilometer Array, characterized by
extremely large data volume and intense data rates, therefore demanding
innovative and highly efficient computational paradigms. In this context, we
explore a stream-computing approach with the emphasis on the use of
accelerators. In particular, we make use of a new generation of high
performance stream-based parallelization middleware known as InfoSphere
Streams. Its viability for managing and ensuring interoperability and integrity
of signal processing data pipelines is demonstrated in radio astronomy. IBM
InfoSphere Streams embraces the stream-computing paradigm. It is a shift from
conventional data mining techniques (involving analysis of existing data from
databases) towards real-time analytic processing. We discuss using InfoSphere
Streams for effective DPDM in radio astronomy and propose a way in which
InfoSphere Streams can be utilized for large antennae arrays. We present a
case-study: the InfoSphere Streams implementation of an autocorrelating
spectrometer, and using this example we discuss the advantages of the
stream-computing approach and the utilization of hardware accelerators
- …