62,936 research outputs found

    GPUs as Storage System Accelerators

    Full text link
    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

    Parallelization of an object-oriented FEM dynamics code: influence of the strategies on the Speedup

    Get PDF
    This paper presents an implementation in C++ of an explicit parallel finite element code dedicated to the simulation of impacts. We first present a brief overview of the kinematics and the explicit integration scheme with details concerning some particular points. Then we present the OpenMP parallelization toolkit used in order to parallelize our FEM code, and we focus on how the parallelization of the DynELA FEM code has been conducted for a shared memory system using OpenMP. Some examples are then presented to demonstrate the efficiency and accuracy of the proposed implementations concerning the Speedup of the code. Finally, an impact simulation application is presented and results are compared with the ones obtained by the commercial Abaqus explicit FEM code

    Parallel detrended fluctuation analysis for fast event detection on massive PMU data

    Get PDF
    ("(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")Phasor measurement units (PMUs) are being rapidly deployed in power grids due to their high sampling rates and synchronized measurements. The devices high data reporting rates present major computational challenges in the requirement to process potentially massive volumes of data, in addition to new issues surrounding data storage. Fast algorithms capable of processing massive volumes of data are now required in the field of power systems. This paper presents a novel parallel detrended fluctuation analysis (PDFA) approach for fast event detection on massive volumes of PMU data, taking advantage of a cluster computing platform. The PDFA algorithm is evaluated using data from installed PMUs on the transmission system of Great Britain from the aspects of speedup, scalability, and accuracy. The speedup of the PDFA in computation is initially analyzed through Amdahl's Law. A revision to the law is then proposed, suggesting enhancements to its capability to analyze the performance gain in computation when parallelizing data intensive applications in a cluster computing environment

    A review of parallel computing for large-scale remote sensing image mosaicking

    Get PDF
    Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further research of image mosaicking parallelism on a large scale. This paper provides a perspective on the current state of image mosaicking parallelization for large scale applications. We firstly introduce the motivation of image mosaicking parallel for large scale application, and analyze the difficulty and problem of parallel image mosaicking at large scale such as scheduling with huge number of dependent tasks, programming with multiple-step procedure, dealing with frequent I/O operation. Then we summarize the existing studies of parallel computing in image mosaicking for large scale applications with respect to problem decomposition and parallel strategy, parallel architecture, task schedule strategy and implementation of image mosaicking parallelization. Finally, the key problems and future potential research directions for image mosaicking are addressed

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Efficient and Reasonable Object-Oriented Concurrency

    Full text link
    Making threaded programs safe and easy to reason about is one of the chief difficulties in modern programming. This work provides an efficient execution model for SCOOP, a concurrency approach that provides not only data race freedom but also pre/postcondition reasoning guarantees between threads. The extensions we propose influence both the underlying semantics to increase the amount of concurrent execution that is possible, exclude certain classes of deadlocks, and enable greater performance. These extensions are used as the basis an efficient runtime and optimization pass that improve performance 15x over a baseline implementation. This new implementation of SCOOP is also 2x faster than other well-known safe concurrent languages. The measurements are based on both coordination-intensive and data-manipulation-intensive benchmarks designed to offer a mixture of workloads.Comment: Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '15). ACM, 201
    corecore