3,611 research outputs found
Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform
Advances in detectors and computational technologies provide new
opportunities for applied research and the fundamental sciences. Concurrently,
dramatic increases in the three Vs (Volume, Velocity, and Variety) of
experimental data and the scale of computational tasks produced the demand for
new real-time processing systems at experimental facilities. Recently, this
demand was addressed by the Spark-MPI approach connecting the Spark
data-intensive platform with the MPI high-performance framework. In contrast
with existing data management and analytics systems, Spark introduced a new
middleware based on resilient distributed datasets (RDDs), which decoupled
various data sources from high-level processing algorithms. The RDD middleware
significantly advanced the scope of data-intensive applications, spreading from
SQL queries to machine learning to graph processing. Spark-MPI further extended
the Spark ecosystem with the MPI applications using the Process Management
Interface. The paper explores this integrated platform within the context of
online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201
MOSDEN: An Internet of Things Middleware for Resource Constrained Mobile Devices
The Internet of Things (IoT) is part of Future Internet and will comprise
many billions of Internet Connected Objects (ICO) or `things' where things can
sense, communicate, compute and potentially actuate as well as have
intelligence, multi-modal interfaces, physical/ virtual identities and
attributes. Collecting data from these objects is an important task as it
allows software systems to understand the environment better. Many different
hardware devices may involve in the process of collecting and uploading sensor
data to the cloud where complex processing can occur. Further, we cannot expect
all these objects to be connected to the computers due to technical and
economical reasons. Therefore, we should be able to utilize resource
constrained devices to collect data from these ICOs. On the other hand, it is
critical to process the collected sensor data before sending them to the cloud
to make sure the sustainability of the infrastructure due to energy
constraints. This requires to move the sensor data processing tasks towards the
resource constrained computational devices (e.g. mobile phones). In this paper,
we propose Mobile Sensor Data Processing Engine (MOSDEN), an plug-in-based IoT
middleware for mobile devices, that allows to collect and process sensor data
without programming efforts. Our architecture also supports sensing as a
service model. We present the results of the evaluations that demonstrate its
suitability towards real world deployments. Our proposed middleware is built on
Android platform
Marrying Big Data with Smart Data in Sensor Stream Processing
Widespread deployments of spatially distributed sensors are continuously generating data that require advanced analytical processing and interpretation by machines. Devising machine-interpretable descriptions of sensor data is a key issue in building a semantic stream processing engine. This paper proposes a semantic sensor stream processing pipeline using Apache Kafka to publish and subscribe semantic data streams in a scalable way. We use the Kafka Consumer API to annotate the sensor data using the Semantic Sensor Network ontology, then store the annotated output in an RDF triplestore for further reasoning or semantic integration with legacy information systems. We follow a Design Science approach addressing a Smart Airport scenario with geolocated audio sensors to evaluate the viability of the proposed pipeline under various Kafka-based configurations. Our experimental evaluations show that the multi-broker Kafka cluster setup supports read scalability thus facilitating the parallelization of the semantic enrichment of the sensor data
Ztreamy: a middleware for publishing semantic streams on the web
In order to make the semantic sensor Web a reality, middleware for efficiently publishing semantically-annotated data streams on the Web is needed. Such middleware should be designed to allow third parties to reuse and mash-up data coming from streams. These third parties should even be able to publish their own value-added streams derived from other streams and static data. In this work we present Ztreamy, a scalable middleware platform for the distribution of semantic data streams through HTTP. The platform provides an API for both publishing and consuming streams, as well as built-in filtering services based on data semantics. A key contribution of our proposal with respect to other related systems in the state of the art is its scalability. Our experiments with Ztreamy show that a single server is able, in some configurations, to publish a real-time stream to up to 40.000 simultaneous clients with delivery delays of just a few seconds, largely outperforming other systems in the state of the art.Publicad
A framework for P2P application development
Although Peer-to-Peer (P2P) computing has become increasingly popular over recent years, there still exist only a very small number of application domains that have exploited it on a large scale. This can be attributed to a number of reasons including the rapid evolution of P2P technologies, coupled with their often-complex nature. This paper describes an implemented abstraction framework that seeks to aid developers in building P2P applications. A selection of example P2P applications that have been developed using this framework are also presented
An IoT Cloud and Big Data Architecture for the Maintenance of Home Appliances
Billions of interconnected Internet of Things (IoT) sensors and devices
collect tremendous amounts of data from real-world scenarios. Big data is
generating increasing interest in a wide range of industries. Once data is
analyzed through compute-intensive Machine Learning (ML) methods, it can derive
critical business value for organizations. Powerfulplatforms are essential to
handle and process such massive collections of information cost-effectively and
conveniently. This work introduces a distributed and scalable platform
architecture that can be deployed for efficient real-world big data collection
and analytics. The proposed system was tested with a case study for Predictive
Maintenance of Home Appliances, where current and vibration sensors with high
acquisition frequency were connected to washing machines and refrigerators. The
introduced platform was used to collect, store, and analyze the data. The
experimental results demonstrated that the presented system could be
advantageous for tackling real-world IoT scenarios in a cost-effective and
local approach.Comment: 6 pages, 6 figures, IECON 202
Data Provenance and Management in Radio Astronomy: A Stream Computing Approach
New approaches for data provenance and data management (DPDM) are required
for mega science projects like the Square Kilometer Array, characterized by
extremely large data volume and intense data rates, therefore demanding
innovative and highly efficient computational paradigms. In this context, we
explore a stream-computing approach with the emphasis on the use of
accelerators. In particular, we make use of a new generation of high
performance stream-based parallelization middleware known as InfoSphere
Streams. Its viability for managing and ensuring interoperability and integrity
of signal processing data pipelines is demonstrated in radio astronomy. IBM
InfoSphere Streams embraces the stream-computing paradigm. It is a shift from
conventional data mining techniques (involving analysis of existing data from
databases) towards real-time analytic processing. We discuss using InfoSphere
Streams for effective DPDM in radio astronomy and propose a way in which
InfoSphere Streams can be utilized for large antennae arrays. We present a
case-study: the InfoSphere Streams implementation of an autocorrelating
spectrometer, and using this example we discuss the advantages of the
stream-computing approach and the utilization of hardware accelerators
- …