207,506 research outputs found
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
Recommended from our members
The Capital Improvement Plan Environmental Assessment Process
This report contains one explicit mention of Waller Creek and how a Convention Center 66" Water Transmission Line Relocation resulted in a recommendation for the restoration of Waller Creek banks.This report outlines the current requirements for Environmental Assessments (EAs) performed for compliance with the City of Austin Land Development Code (LDC) as they are applied in City Capital Improvement Plan (CIP) projects. Much of this information is not currently documented in either the Environmental Criteria Manual (ECM) or other material readily available to Public Works Project Managers. An overview of the Environmental Assessment process is provided along with the goals for CIP assessments, methods for review and completion of assessments, and recommendations for improving the City processes. Attachments to this report include pertinent LDC citations, the form in use for project identification, a suggested process for conducting and reviewing assessments a scope of work for staff or consultants performing assessments, and photographic summaries of critical environmental features to be protected in accordance with the LDC in City as well as private projects. Also, a flowchart of the EA review process and a brief summary of assessments of past projects are included in the attachments. The information is provided as a precursor to the expansion of the current ECM section on Environmental Assessments in Section 1.3.0 and for consideration by the Public Works Department and other Project Managers for early review of environmental impacts, leading to better CIP projects..Waller Creek Working Grou
A sparse octree gravitational N-body code that runs entirely on the GPU processor
We present parallel algorithms for constructing and traversing sparse octrees
on graphics processing units (GPUs). The algorithms are based on parallel-scan
and sort methods. To test the performance and feasibility, we implemented them
in CUDA in the form of a gravitational tree-code which completely runs on the
GPU.(The code is publicly available at:
http://castle.strw.leidenuniv.nl/software.html) The tree construction and
traverse algorithms are portable to many-core devices which have support for
CUDA or OpenCL programming languages. The gravitational tree-code outperforms
tuned CPU code during the tree-construction and shows a performance improvement
of more than a factor 20 overall, resulting in a processing rate of more than
2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35
pages, 12 figures, single colum
High-threshold fault-tolerant quantum computation with analog quantum error correction
To implement fault-tolerant quantum computation with continuous variables,
the Gottesman-Kitaev-Preskill (GKP) qubit has been recognized as an important
technological element. However,it is still challenging to experimentally
generate the GKP qubit with the required squeezing level, 14.8 dB, of the
existing fault-tolerant quantum computation. To reduce this requirement, we
propose a high-threshold fault-tolerant quantum computation with GKP qubits
using topologically protected measurement-based quantum computation with the
surface code. By harnessing analog information contained in the GKP qubits, we
apply analog quantum error correction to the surface code.Furthermore, we
develop a method to prevent the squeezing level from decreasing during the
construction of the large scale cluster states for the topologically protected
measurement based quantum computation. We numerically show that the required
squeezing level can be relaxed to less than 10 dB, which is within the reach of
the current experimental technology. Hence, this work can considerably
alleviate this experimental requirement and take a step closer to the
realization of large scale quantum computation.Comment: 14 pages, 7 figure
Low carbon housing: lessons from Elm Tree Mews
This report sets out the findings from a low carbon housing trial at Elm Tree Mews, York, and discusses the technical and policy issues that arise from it. The Government has set an ambitious target for all new housing to be zero carbon by 2016. With the application of good insulation, improved efficiencies and renewable energy, this is theoretically possible. However, there is growing concern that, in practice, even existing carbon standards are not being achieved and that this performance gap has the potential to undermine zero carbon housing policy. The report seeks to address these concerns through the detailed evaluation of a low carbon development at Elm Tree Mews. The report: * evaluates the energy/carbon performance of the dwellings prior to occupation and in use; * analyses the procurement, design and construction processes that give rise to the performance achieved; * explores the resident experience; * draws out lessons for the development of zero carbon housing and the implications for government policy; and * proposes a programme for change, designed to close the performance gap
A Very Fast and Momentum-Conserving Tree Code
The tree code for the approximate evaluation of gravitational forces is
extended and substantially accelerated by including mutual cell-cell
interactions. These are computed by a Taylor series in Cartesian coordinates
and in a completely symmetric fashion, such that Newton's third law is
satisfied by construction and hence momentum exactly conserved. The
computational effort is further reduced by exploiting the mutual symmetry of
the interactions. For typical astrophysical problems with N=10^5 and at the
same level of accuracy, the new code is about four times faster than the tree
code. For large N, the computational costs are found to scale almost linearly
with N, which can also be supported by a theoretical argument, and the
advantage over the tree code increases with ever larger N.Comment: revised version (accepted by ApJ Letters), 5 pages LaTeX, 3 figure
Partitioned List Decoding of Polar Codes: Analysis and Improvement of Finite Length Performance
Polar codes represent one of the major recent breakthroughs in coding theory
and, because of their attractive features, they have been selected for the
incoming 5G standard. As such, a lot of attention has been devoted to the
development of decoding algorithms with good error performance and efficient
hardware implementation. One of the leading candidates in this regard is
represented by successive-cancellation list (SCL) decoding. However, its
hardware implementation requires a large amount of memory. Recently, a
partitioned SCL (PSCL) decoder has been proposed to significantly reduce the
memory consumption. In this paper, we examine the paradigm of PSCL decoding
from both theoretical and practical standpoints: (i) by changing the
construction of the code, we are able to improve the performance at no
additional computational, latency or memory cost, (ii) we present an optimal
scheme to allocate cyclic redundancy checks (CRCs), and (iii) we provide an
upper bound on the list size that allows MAP performance.Comment: 2017 IEEE Global Communications Conference (GLOBECOM
- …