929 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor

    Full text link
    Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure

    GraphM : an efficient storage system for high throughput of concurrent graph processing

    Get PDF
    With the rapidly growing demand of graph processing in the real world, a large number of iterative graph processing jobs run concurrently on the same underlying graph. However, the storage engines of existing graph processing frameworks are mainly designed for running an individual job. Our studies show that they are inefficient when running concurrent jobs due to the redundant data storage and access overhead. To cope with this issue, we develop an efficient storage system, called GraphM. It can be integrated into the existing graph processing systems to efficiently support concurrent iterative graph processing jobs for higher throughput by fully exploiting the similarities of the data accesses between these concurrent jobs. GraphM regularizes the traversing order of the graph partitions for concurrent graph processing jobs by streaming the partitions into the main memory and the Last-Level Cache (LLC) in a common order, and then processes the related jobs concurrently in a novel fine-grained synchronization. In this way, the concurrent jobs share the same graph structure data in the LLC/memory and also the data accesses to the graph, so as to amortize the storage consumption and the data access overhead. To demonstrate the efficiency of GraphM, we plug it into state-of-the-art graph processing systems, including GridGraph, GraphChi, PowerGraph, and Chaos. Experiments results show that GraphM improves the throughput by 1.73~13 times

    Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies

    Get PDF
    The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to billions of galaxies. We present Galactos, a high-performance implementation of a novel, O(N^2) algorithm that uses a load-balanced k-d tree and spherical harmonic expansions to compute the anisotropic 3PCF. Our implementation is optimized for the Intel Xeon Phi architecture, exploiting SIMD parallelism, instruction and thread concurrency, and significant L1 and L2 cache reuse, reaching 39% of peak performance on a single node. Galactos scales to the full Cori system, achieving 9.8PF (peak) and 5.06PF (sustained) across 9636 nodes, making the 3PCF easily computable for all galaxies in the observable universe.Comment: 11 pages, 7 figures, accepted to SuperComputing 201

    GekkoFS: A temporary distributed file system for HPC applications

    Get PDF
    We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems.The work has been funded by the German Research Foundation (DFG) through the ADA-FS project as part of the Priority Programme 1648. It is also supported by the Spanish Ministry of Science and Innovation (TIN2015–65316), the Generalitat de Catalunya (2014–SGR–1051), as well as the European Union’s Horizon 2020 Research and Innovation Programme (NEXTGenIO, 671951) and the European Comission’s BigStorage project (H2020-MSCA-ITN-2014-642963). This research was conducted using the supercomputer MOGON II and services offered by the Johannes Gutenberg University Mainz.Peer ReviewedPostprint (author's final draft

    Simurgh: a fully decentralized and secure NVMM user space file system

    Get PDF
    The availability of non-volatile main memory (NVMM) has started a new era for storage systems and NVMM specific file systems can support extremely high data and metadata rates, which are required by many HPC and data-intensive applications. Scaling metadata performance within NVMM file systems is nevertheless often restricted by the Linux kernel storage stack, while simply moving metadata management to the user space can compromise security or flexibility. This paper introduces Simurgh, a hardware-assisted user space file system with decentralized metadata management that allows secure metadata updates from within user space. Simurgh guarantees consistency, durability, and ordering of updates without sacrificing scalability. Security is enforced by only allowing NVMM access from protected user space functions, which can be implemented through two proposed instructions. Comparisons with other NVMM file systems show that Simurgh improves metadata performance up to 18x and application performance up to 89% compared to the second-fastest file system.This work has been supported by the European Comission’s BigStorage project H2020-MSCA-ITN2014-642963. It is also supported by the Big Data in Atmospheric Physics (BINARY) project, funded by the Carl Zeiss Foundation under Grant No.: P2018-02-003.Peer ReviewedPostprint (author's final draft
    • …
    corecore