3,611 research outputs found

    Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform

    Full text link
    Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three Vs (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201

    MOSDEN: An Internet of Things Middleware for Resource Constrained Mobile Devices

    Get PDF
    The Internet of Things (IoT) is part of Future Internet and will comprise many billions of Internet Connected Objects (ICO) or `things' where things can sense, communicate, compute and potentially actuate as well as have intelligence, multi-modal interfaces, physical/ virtual identities and attributes. Collecting data from these objects is an important task as it allows software systems to understand the environment better. Many different hardware devices may involve in the process of collecting and uploading sensor data to the cloud where complex processing can occur. Further, we cannot expect all these objects to be connected to the computers due to technical and economical reasons. Therefore, we should be able to utilize resource constrained devices to collect data from these ICOs. On the other hand, it is critical to process the collected sensor data before sending them to the cloud to make sure the sustainability of the infrastructure due to energy constraints. This requires to move the sensor data processing tasks towards the resource constrained computational devices (e.g. mobile phones). In this paper, we propose Mobile Sensor Data Processing Engine (MOSDEN), an plug-in-based IoT middleware for mobile devices, that allows to collect and process sensor data without programming efforts. Our architecture also supports sensing as a service model. We present the results of the evaluations that demonstrate its suitability towards real world deployments. Our proposed middleware is built on Android platform

    Marrying Big Data with Smart Data in Sensor Stream Processing

    Get PDF
    Widespread deployments of spatially distributed sensors are continuously generating data that require advanced analytical processing and interpretation by machines. Devising machine-interpretable descriptions of sensor data is a key issue in building a semantic stream processing engine. This paper proposes a semantic sensor stream processing pipeline using Apache Kafka to publish and subscribe semantic data streams in a scalable way. We use the Kafka Consumer API to annotate the sensor data using the Semantic Sensor Network ontology, then store the annotated output in an RDF triplestore for further reasoning or semantic integration with legacy information systems. We follow a Design Science approach addressing a Smart Airport scenario with geolocated audio sensors to evaluate the viability of the proposed pipeline under various Kafka-based configurations. Our experimental evaluations show that the multi-broker Kafka cluster setup supports read scalability thus facilitating the parallelization of the semantic enrichment of the sensor data

    Ztreamy: a middleware for publishing semantic streams on the web

    Get PDF
    In order to make the semantic sensor Web a reality, middleware for efficiently publishing semantically-annotated data streams on the Web is needed. Such middleware should be designed to allow third parties to reuse and mash-up data coming from streams. These third parties should even be able to publish their own value-added streams derived from other streams and static data. In this work we present Ztreamy, a scalable middleware platform for the distribution of semantic data streams through HTTP. The platform provides an API for both publishing and consuming streams, as well as built-in filtering services based on data semantics. A key contribution of our proposal with respect to other related systems in the state of the art is its scalability. Our experiments with Ztreamy show that a single server is able, in some configurations, to publish a real-time stream to up to 40.000 simultaneous clients with delivery delays of just a few seconds, largely outperforming other systems in the state of the art.Publicad

    A framework for P2P application development

    Get PDF
    Although Peer-to-Peer (P2P) computing has become increasingly popular over recent years, there still exist only a very small number of application domains that have exploited it on a large scale. This can be attributed to a number of reasons including the rapid evolution of P2P technologies, coupled with their often-complex nature. This paper describes an implemented abstraction framework that seeks to aid developers in building P2P applications. A selection of example P2P applications that have been developed using this framework are also presented

    An IoT Cloud and Big Data Architecture for the Maintenance of Home Appliances

    Full text link
    Billions of interconnected Internet of Things (IoT) sensors and devices collect tremendous amounts of data from real-world scenarios. Big data is generating increasing interest in a wide range of industries. Once data is analyzed through compute-intensive Machine Learning (ML) methods, it can derive critical business value for organizations. Powerfulplatforms are essential to handle and process such massive collections of information cost-effectively and conveniently. This work introduces a distributed and scalable platform architecture that can be deployed for efficient real-world big data collection and analytics. The proposed system was tested with a case study for Predictive Maintenance of Home Appliances, where current and vibration sensors with high acquisition frequency were connected to washing machines and refrigerators. The introduced platform was used to collect, store, and analyze the data. The experimental results demonstrated that the presented system could be advantageous for tackling real-world IoT scenarios in a cost-effective and local approach.Comment: 6 pages, 6 figures, IECON 202

    Data Provenance and Management in Radio Astronomy: A Stream Computing Approach

    Get PDF
    New approaches for data provenance and data management (DPDM) are required for mega science projects like the Square Kilometer Array, characterized by extremely large data volume and intense data rates, therefore demanding innovative and highly efficient computational paradigms. In this context, we explore a stream-computing approach with the emphasis on the use of accelerators. In particular, we make use of a new generation of high performance stream-based parallelization middleware known as InfoSphere Streams. Its viability for managing and ensuring interoperability and integrity of signal processing data pipelines is demonstrated in radio astronomy. IBM InfoSphere Streams embraces the stream-computing paradigm. It is a shift from conventional data mining techniques (involving analysis of existing data from databases) towards real-time analytic processing. We discuss using InfoSphere Streams for effective DPDM in radio astronomy and propose a way in which InfoSphere Streams can be utilized for large antennae arrays. We present a case-study: the InfoSphere Streams implementation of an autocorrelating spectrometer, and using this example we discuss the advantages of the stream-computing approach and the utilization of hardware accelerators
    corecore