14,216 research outputs found

    GPUs as Storage System Accelerators

    Full text link
    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Online Pattern Recognition for the ALICE High Level Trigger

    Full text link
    The ALICE High Level Trigger has to process data online, in order to select interesting (sub)events, or to compress data efficiently by modeling techniques.Focusing on the main data source, the Time Projection Chamber (TPC), we present two pattern recognition methods under investigation: a sequential approach "cluster finder" and "track follower") and an iterative approach ("track candidate finder" and "cluster deconvoluter"). We show, that the former is suited for pp and low multiplicity PbPb collisions, whereas the latter might be applicable for high multiplicity PbPb collisions, if it turns out, that more than 8000 charged particles would have to be reconstructed inside the TPC. Based on the developed tracking schemes we show, that using modeling techniques a compression factor of around 10 might be achievableComment: Realtime Conference 2003, Montreal, Canada to be published in IEEE Transactions on Nuclear Science (TNS), 6 pages, 8 figure

    MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    Get PDF
    With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library

    ART Neural Networks: Distributed Coding and ARTMAP Applications

    Full text link
    ART (Adaptive Resonance Theory) neural networks for fast, stable learning and prediction have been applied in a variety of areas. Applications include airplane design and manufacturing, automatic target recognition, financial forecasting, machine tool monitoring, digital circuit design, chemical analysis, and robot vision. Supervised ART architectures, called ARTMAP systems, feature internal control mechanisms that create stable recognition categories of optimal size by maximizing code compression while minimizing predictive error in an on-line setting. Special-purpose requirements of various application domains have led to a number of ARTMAP variants, including fuzzy ARTMAP, ART-EMAP, Gaussian ARTMAP, and distributed ARTMAP. ARTMAP has been used for a variety of applications, including computer-assisted medical diagnosis. Medical databases present many of the challenges found in general information management settings where speed, efficiency, ease of use, and accuracy are at a premium. A direct goal of improved computer-assisted medicine is to help deliver quality emergency care in situations that may be less than ideal. Working with these problems has stimulated a number of ART architecture developments, including ARTMAP-IC [1]. This paper describes a recent collaborative effort, using a new cardiac care database for system development, has brought together medical statisticians and clinicians at the New England Medical Center with researchers developing expert systems and neural networks, in order to create a hybrid method for medical diagnosis. The paper also considers new neural network architectures, including distributed ART {dART), a real-time model of parallel distributed pattern learning that permits fast as well as slow adaptation, without catastrophic forgetting. Local synaptic computations in the dART model quantitatively match the paradoxical phenomenon of Markram-Tsodyks [2] redistribution of synaptic efficacy, as a consequence of global system hypotheses.Office of Naval Research (N00014-95-1-0409, N00014-95-1-0657
    • …
    corecore