1,590 research outputs found

    Data Provenance and Management in Radio Astronomy: A Stream Computing Approach

    Get PDF
    New approaches for data provenance and data management (DPDM) are required for mega science projects like the Square Kilometer Array, characterized by extremely large data volume and intense data rates, therefore demanding innovative and highly efficient computational paradigms. In this context, we explore a stream-computing approach with the emphasis on the use of accelerators. In particular, we make use of a new generation of high performance stream-based parallelization middleware known as InfoSphere Streams. Its viability for managing and ensuring interoperability and integrity of signal processing data pipelines is demonstrated in radio astronomy. IBM InfoSphere Streams embraces the stream-computing paradigm. It is a shift from conventional data mining techniques (involving analysis of existing data from databases) towards real-time analytic processing. We discuss using InfoSphere Streams for effective DPDM in radio astronomy and propose a way in which InfoSphere Streams can be utilized for large antennae arrays. We present a case-study: the InfoSphere Streams implementation of an autocorrelating spectrometer, and using this example we discuss the advantages of the stream-computing approach and the utilization of hardware accelerators

    GPUs as Storage System Accelerators

    Full text link
    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

    On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework

    Get PDF
    The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations, computing clusters and distributed cloud appliances

    Customizing Data-plane Processing in Edge Routers

    Get PDF
    While OpenFlow enables the customization of the control plane of a router, currently no solutions are available for the customization of the data plane. This paper presents a prototype that offers to third parties (even end-users) the possibility to install their own applications on the data plane of a router, particularly the ones operating at the edge of the network. This paper presents the motivation of the idea, the reason why we use OpenFlow even if it does not seem appropriate for the data plane, the architecture and the implementation of our prototype, and a first characterization of the system running in our la

    A novel network architecture for train-to-wayside communication with quality of service over heterogeneous wireless networks

    Get PDF
    In the railway industry, there are nowadays different actors who would like to send or receive data from the wayside to an onboard device or vice versa. These actors are e.g., the Train Operation Company, the Train Constructing Company, a Content Provider, etc. This requires a communication module on each train and at the wayside. These modules interact with each other over heterogeneous wireless links. This system is referred to as the Train-to-Wayside Communication System (TWCS). While there are already a lot of deployments using a TWCS, the implementation of quality of service, performance enhancing proxies (PEP) and the network mobility functions have not yet been fully integrated in TWCS systems. Therefore, we propose a novel and modular IPv6-enabled TWCS architecture in this article. It jointly tackles these functions and considers their mutual dependencies and relationships. DiffServ is used to differentiate between service classes and priorities. Virtual local area networks are used to differentiate between different service level agreements. In the PEP, we propose to use a distributed TCP accelerator to optimize bandwidth usage. Concerning network mobility, we propose to use the SCTP protocol (with Dynamic Address Reconfiguration and PR-SCTP extensions) to create a tunnel per wireless link, in order to support the reliable transmission of data between the accelerators. We have analyzed different design choices, pinpointed the main implementation challenges and identified candidate solutions for the different modules in the TWCS system. As such, we present an elaborated framework that can be used for prototyping a fully featured TWCS

    A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors

    Full text link
    As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application
    corecore