Search CORE

207,506 research outputs found

Algorithmic patterns for $\mathcal{H}$ -matrices on many-core processors

Author: Zaspel Peter
Publication venue
Publication date: 01/01/2017
Field of study

In this work, we consider the reformulation of hierarchical (

\mathcal{H}

) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs).

\mathcal{H}

matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of

\mathcal{H}

matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing

\mathcal{H}

matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full

\mathcal{H}

matrix construction and the fast matrix-vector product to many-core hardware. Here, crucial ingredients are space filling curves, parallel tree traversal and batching of linear algebra operations. The resulting model GPU implementation hmglib is the, to the best of the authors knowledge, first entirely GPU-based Open Source

\mathcal{H}

matrix library of this kind. We conclude this work by an in-depth performance analysis and a comparative performance study against a standard

\mathcal{H}

matrix library, highlighting profound speedups of our many-core parallel approach

arXiv.org e-Print Archive

edoc

Recommended from our members

The Capital Improvement Plan Environmental Assessment Process

Author: Lyday Mike
Publication venue: City of Austin Environmental Resources Management Division
Publication date: 01/01/2000
Field of study

This report contains one explicit mention of Waller Creek and how a Convention Center 66" Water Transmission Line Relocation resulted in a recommendation for the restoration of Waller Creek banks.This report outlines the current requirements for Environmental Assessments (EAs) performed for compliance with the City of Austin Land Development Code (LDC) as they are applied in City Capital Improvement Plan (CIP) projects. Much of this information is not currently documented in either the Environmental Criteria Manual (ECM) or other material readily available to Public Works Project Managers. An overview of the Environmental Assessment process is provided along with the goals for CIP assessments, methods for review and completion of assessments, and recommendations for improving the City processes. Attachments to this report include pertinent LDC citations, the form in use for project identification, a suggested process for conducting and reviewing assessments a scope of work for staff or consultants performing assessments, and photographic summaries of critical environmental features to be protected in accordance with the LDC in City as well as private projects. Also, a flowchart of the EA review process and a brief summary of assessments of past projects are included in the attachments. The information is provided as a precursor to the expansion of the current ECM section on Environmental Assessments in Section 1.3.0 and for consideration by the Public Works Department and other Project Managers for early review of environmental impacts, leading to better CIP projects..Waller Creek Working Grou

Texas ScholarWorks

A sparse octree gravitational N-body code that runs entirely on the GPU processor

Author: Barnes
Barnes
Barnes
Belleman
Billeter
Buck
Burtscher
de Berg
Dehnen
Dubinski
Evghenii Gaburov
Fukushige
Gaburov
Gaburov
Hamada
Hamada
Harfst
Hut
Jeroen Bédorf
Knuth
Lauterbach
Makino
Makino
McMillan
Nyland
Plummer
Portegies Zwart
Portegies Zwart
Raman
Salmon
Satish
Simon Portegies Zwart
Springel
Warren
Yokota
Publication venue: 'Elsevier BV'
Publication date: 01/04/2012
Field of study

We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

High-threshold fault-tolerant quantum computation with analog quantum error correction

Author: Fujii Keisuke
Fukui Kosuke
Okamoto Atsushi
Tomita Akihisa
Publication venue: 'American Physical Society (APS)'
Publication date: 06/03/2018
Field of study

To implement fault-tolerant quantum computation with continuous variables, the Gottesman-Kitaev-Preskill (GKP) qubit has been recognized as an important technological element. However,it is still challenging to experimentally generate the GKP qubit with the required squeezing level, 14.8 dB, of the existing fault-tolerant quantum computation. To reduce this requirement, we propose a high-threshold fault-tolerant quantum computation with GKP qubits using topologically protected measurement-based quantum computation with the surface code. By harnessing analog information contained in the GKP qubits, we apply analog quantum error correction to the surface code.Furthermore, we develop a method to prevent the squeezing level from decreasing during the construction of the large scale cluster states for the topologically protected measurement based quantum computation. We numerically show that the required squeezing level can be relaxed to less than 10 dB, which is within the reach of the current experimental technology. Hence, this work can considerably alleviate this experimental requirement and take a step closer to the realization of large scale quantum computation.Comment: 14 pages, 7 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Kyoto University Research Information Repository

Hokkaido University Collection of Scholarly and Academic Papers

Low carbon housing: lessons from Elm Tree Mews

Author: Bell M
Miles-Shenton D
Seavers J
Wingfield J
Publication venue: Joseph Rowntree Foundation
Publication date: 03/11/2010
Field of study

This report sets out the findings from a low carbon housing trial at Elm Tree Mews, York, and discusses the technical and policy issues that arise from it. The Government has set an ambitious target for all new housing to be zero carbon by 2016. With the application of good insulation, improved efficiencies and renewable energy, this is theoretically possible. However, there is growing concern that, in practice, even existing carbon standards are not being achieved and that this performance gap has the potential to undermine zero carbon housing policy. The report seeks to address these concerns through the detailed evaluation of a low carbon development at Elm Tree Mews. The report: * evaluates the energy/carbon performance of the dwellings prior to occupation and in use; * analyses the procurement, design and construction processes that give rise to the performance achieved; * explores the resident experience; * draws out lessons for the development of zero carbon housing and the implications for government policy; and * proposes a programme for change, designed to close the performance gap

Leeds Beckett Repository

A Very Fast and Momentum-Conserving Tree Code

Author: Dehnen Walter
Publication venue: 'University of Chicago Press'
Publication date: 15/03/2000
Field of study

The tree code for the approximate evaluation of gravitational forces is extended and substantially accelerated by including mutual cell-cell interactions. These are computed by a Taylor series in Cartesian coordinates and in a completely symmetric fashion, such that Newton's third law is satisfied by construction and hence momentum exactly conserved. The computational effort is further reduced by exploiting the mutual symmetry of the interactions. For typical astrophysical problems with N=10^5 and at the same level of accuracy, the new code is about four times faster than the tree code. For large N, the computational costs are found to scale almost linearly with N, which can also be supported by a theoretical argument, and the advantage over the tree code increases with ever larger N.Comment: revised version (accepted by ApJ Letters), 5 pages LaTeX, 3 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Partitioned List Decoding of Polar Codes: Analysis and Improvement of Finite Length Performance

Author: Gross Warren J.
Hashemi Seyyed Ali
Hassani S. Hamed
Mondelli Marco
Urbanke Rudiger
Publication venue
Publication date: 29/08/2017
Field of study

Polar codes represent one of the major recent breakthroughs in coding theory and, because of their attractive features, they have been selected for the incoming 5G standard. As such, a lot of attention has been devoted to the development of decoding algorithms with good error performance and efficient hardware implementation. One of the leading candidates in this regard is represented by successive-cancellation list (SCL) decoding. However, its hardware implementation requires a large amount of memory. Recently, a partitioned SCL (PSCL) decoder has been proposed to significantly reduce the memory consumption. In this paper, we examine the paradigm of PSCL decoding from both theoretical and practical standpoints: (i) by changing the construction of the code, we are able to improve the performance at no additional computational, latency or memory cost, (ii) we present an optimal scheme to allocate cyclic redundancy checks (CRCs), and (iii) we provide an upper bound on the list size that allows MAP performance.Comment: 2017 IEEE Global Communications Conference (GLOBECOM

arXiv.org e-Print Archive

Crossref