207,506 research outputs found

    Algorithmic patterns for H\mathcal{H}-matrices on many-core processors

    Get PDF
    In this work, we consider the reformulation of hierarchical (H\mathcal{H}) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs). H\mathcal{H} matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of H\mathcal{H} matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing H\mathcal{H} matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full H\mathcal{H} matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source H\mathcal{H} matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard H\mathcal{H} matrix library, highlighting profound speedups of our many-core parallel approach

    A sparse octree gravitational N-body code that runs entirely on the GPU processor

    Get PDF
    We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

    High-threshold fault-tolerant quantum computation with analog quantum error correction

    Get PDF
    To implement fault-tolerant quantum computation with continuous variables, the Gottesman-Kitaev-Preskill (GKP) qubit has been recognized as an important technological element. However,it is still challenging to experimentally generate the GKP qubit with the required squeezing level, 14.8 dB, of the existing fault-tolerant quantum computation. To reduce this requirement, we propose a high-threshold fault-tolerant quantum computation with GKP qubits using topologically protected measurement-based quantum computation with the surface code. By harnessing analog information contained in the GKP qubits, we apply analog quantum error correction to the surface code.Furthermore, we develop a method to prevent the squeezing level from decreasing during the construction of the large scale cluster states for the topologically protected measurement based quantum computation. We numerically show that the required squeezing level can be relaxed to less than 10 dB, which is within the reach of the current experimental technology. Hence, this work can considerably alleviate this experimental requirement and take a step closer to the realization of large scale quantum computation.Comment: 14 pages, 7 figure

    Low carbon housing: lessons from Elm Tree Mews

    Get PDF
    This report sets out the findings from a low carbon housing trial at Elm Tree Mews, York, and discusses the technical and policy issues that arise from it. The Government has set an ambitious target for all new housing to be zero carbon by 2016. With the application of good insulation, improved efficiencies and renewable energy, this is theoretically possible. However, there is growing concern that, in practice, even existing carbon standards are not being achieved and that this performance gap has the potential to undermine zero carbon housing policy. The report seeks to address these concerns through the detailed evaluation of a low carbon development at Elm Tree Mews. The report: * evaluates the energy/carbon performance of the dwellings prior to occupation and in use; * analyses the procurement, design and construction processes that give rise to the performance achieved; * explores the resident experience; * draws out lessons for the development of zero carbon housing and the implications for government policy; and * proposes a programme for change, designed to close the performance gap

    A Very Fast and Momentum-Conserving Tree Code

    Full text link
    The tree code for the approximate evaluation of gravitational forces is extended and substantially accelerated by including mutual cell-cell interactions. These are computed by a Taylor series in Cartesian coordinates and in a completely symmetric fashion, such that Newton's third law is satisfied by construction and hence momentum exactly conserved. The computational effort is further reduced by exploiting the mutual symmetry of the interactions. For typical astrophysical problems with N=10^5 and at the same level of accuracy, the new code is about four times faster than the tree code. For large N, the computational costs are found to scale almost linearly with N, which can also be supported by a theoretical argument, and the advantage over the tree code increases with ever larger N.Comment: revised version (accepted by ApJ Letters), 5 pages LaTeX, 3 figure

    Partitioned List Decoding of Polar Codes: Analysis and Improvement of Finite Length Performance

    Full text link
    Polar codes represent one of the major recent breakthroughs in coding theory and, because of their attractive features, they have been selected for the incoming 5G standard. As such, a lot of attention has been devoted to the development of decoding algorithms with good error performance and efficient hardware implementation. One of the leading candidates in this regard is represented by successive-cancellation list (SCL) decoding. However, its hardware implementation requires a large amount of memory. Recently, a partitioned SCL (PSCL) decoder has been proposed to significantly reduce the memory consumption. In this paper, we examine the paradigm of PSCL decoding from both theoretical and practical standpoints: (i) by changing the construction of the code, we are able to improve the performance at no additional computational, latency or memory cost, (ii) we present an optimal scheme to allocate cyclic redundancy checks (CRCs), and (iii) we provide an upper bound on the list size that allows MAP performance.Comment: 2017 IEEE Global Communications Conference (GLOBECOM
    corecore