670 research outputs found

    Resource-aware Data Parallel Array Processing

    Get PDF

    QuEST and High Performance Simulation of Quantum Computers

    Full text link
    We introduce QuEST, the Quantum Exact Simulation Toolkit, and compare it to ProjectQ, qHipster and a recent distributed implementation of Quantum++. QuEST is the first open source, OpenMP and MPI hybridised, GPU accelerated simulator of universal quantum circuits. Embodied as a C library, it is designed so that a user's code can be deployed seamlessly to any platform from a laptop to a supercomputer. QuEST is capable of simulating generic quantum circuits of general single-qubit gates and multi-qubit controlled gates, on pure and mixed states, represented as state-vectors and density matrices, and under the presence of decoherence. Using the ARCUS Phase-B and ARCHER supercomputers, we benchmark QuEST's simulation of random circuits of up to 38 qubits, distributed over up to 2048 compute nodes, each with up to 24 cores. We directly compare QuEST's performance to ProjectQ's on single machines, and discuss the differences in distribution strategies of QuEST, qHipster and Quantum++. QuEST shows excellent scaling, both strong and weak, on multicore and distributed architectures.Comment: 8 pages, 8 figures; fixed typos; updated QuEST URL and fixed typo in Fig. 4 caption where ProjectQ and QuEST were swapped in speedup subplot explanation; added explanation of simulation algorithm, updated bibliography; stressed technical novelty of QuEST; mentioned new density matrix suppor

    Using the High Productivity Language Chapel to Target GPGPU Architectures

    Get PDF
    It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPGPU architectures using the high productivity programming language Chapel. Rather than resorting to different parallel libraries or annotations for a given parallel platform, we leverage a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of being portable across distinct classes of parallel architectures, including desktop multicores, distributed memory clusters, large-scale shared memory, and now CPU-GPU hybrids. We present experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA.NSF CCF 0702260Cray Inc. Cray-SRA-2010-016962010-2011 Nvidia Research Fellowshipunpublishednot peer reviewe

    hepaccelerate: Fast Analysis of Columnar Collider Data

    Get PDF
    At HEP experiments, processing terabytes of structured numerical event data to a few statistical summaries is a common task. This step involves selecting events and objects within the event, reconstructing high-level variables, evaluating multivariate classifiers with up to hundreds of variations and creating thousands of low-dimensional histograms. Currently, this is done using multi-step workflows and batch jobs. Based on the CMS search for H(μμ), we demonstrate that it is possible to carry out significant parts of a real collider analysis at a rate of up to a million events per second on a single multicore server with optional GPU acceleration. This is achieved by representing HEP event data as memory-mappable sparse arrays, and by expressing common analysis operations as kernels that can be parallelized across the data using multithreading. We find that only a small number of relatively simple kernels are needed to implement significant parts of this Higgs analysis. Therefore, analysis of real collider datasets of billions events could be done within minutes to a few hours using simple multithreaded codes, reducing the need for managing distributed workflows in the exploratory phase. This approach could speed up the cycle for delivering physics results at HEP experiments. We release the hepaccelerate prototype library as a demonstrator of such accelerated computational kernels. We look forward to discussion, further development and use of efficient and easy-to-use software for terabyte-scale high-level data analysis in the physical sciences

    Automatic Sequential to Parallel Code Conversion

    Get PDF
    The way software programs are being written has been redefined since the introduction of multicore processors. Software developers have started writing parallel programs that are robust and scalable. This would ensure use of processor power being made available in the form of multiple cores. Though this trend is increasing, there are legacy applications that have been developed over the past few decades. Most of these applications are inherently sequential making no use of multithreading or parallel programming. If such applications are ported to execute on the multicore hardware as they are then optimal usage of all cores is not guaranteed. Such applications would ideally utilize only one core and the other cores would remain idle, unless the operating system supports some parallelism while scheduling. Hence there is a need to convert such legacy sequential codes to their parallel versions so that multicore hardware is exploited to the fullest. In this paper we present a tool that we have developed to automatically convert a sequential C code to parallel code. This Sequential to Parallel (S2P) tool is still in the development phase. We also discuss other parallelization tools available today, compare such tools with S2P tool and present our performance analysis results on different kind of multicore hardware

    Algorithmic and infrastructural software development for cryo electron tomography

    Get PDF
    Many Cryo Electron Microscopy (cryoEM) software packages have accumulated significant technical debts over the years, resulting in overcomplicated codebases that are costly to maintain and that slow down development. In this thesis, we advocate for the development of open-source cryoEM core libraries as a solution to this debt and with the ultimate goal of improving the developer and user experience. First, a brief summary of cryoEM is presented, with an emphasis on projection algorithms and tomography. Second, the requirements of modern and future cryoEM image processing are discussed. Third, a new experimental cryoEM core library written in modern C++ is introduced. This library prioritises performance and code reusability, and is designed around a few core functions which offers an efficient model to manipulate multidimensional arrays at an index-wise and element-wise level. C++ template metaprogramming allowed us to develop modular and transparent compute backends, that provide great CPU and GPU performance, unified in an easy to use interface. Fourth, new projection algorithms will be described, notably a grid-driven approach to accurately insert and sample central slices in 3-dimensional (3d) Fourier space. A Fourier-based fused backward-forward projection, further improving the computational efficiency and accuracy of reprojections, will also be presented. Fifth, and as part of our efforts to test and showcase the library, we have started to implement a tilt series alignment package that gathers existing and new techniques into an automated pipeline. The current program first estimates the per-tilt translations and specimen stage rotation using a coarse alignment based on cosine stretching. It then fits the Thon rings of each tilt image as part of a global optimization to estimate the specimen inclination. Finally, we are using our Fourier-based fused reprojection to efficiently refine the per-tilt translations, and are starting to explore ways that would allow us to refine the per-tilt stage rotations

    Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

    Full text link
    High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues. The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.Comment: 18 pages, 9 Figures, 2 Table

    Resource-aware Data Parallel Array Processing

    Get PDF
    • …
    corecore