3,517 research outputs found
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis
Sympiler is a domain-specific code generator that optimizes sparse matrix
computations by decoupling the symbolic analysis phase from the numerical
manipulation stage in sparse codes. The computation patterns in sparse
numerical methods are guided by the input sparsity structure and the sparse
algorithm itself. In many real-world simulations, the sparsity pattern changes
little or not at all. Sympiler takes advantage of these properties to
symbolically analyze sparse codes at compile-time and to apply inspector-guided
transformations that enable applying low-level transformations to sparse codes.
As a result, the Sympiler-generated code outperforms highly-optimized matrix
factorization codes from commonly-used specialized libraries, obtaining average
speedups over Eigen and CHOLMOD of 3.8X and 1.5X respectively.Comment: 12 page
Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain
Towards an Adaptive Skeleton Framework for Performance Portability
The proliferation of widely available, but very different, parallel architectures
makes the ability to deliver good parallel performance
on a range of architectures, or performance portability, highly desirable.
Irregularly-parallel problems, where the number and size
of tasks is unpredictable, are particularly challenging and require
dynamic coordination.
The paper outlines a novel approach to delivering portable parallel
performance for irregularly parallel programs. The approach
combines declarative parallelism with JIT technology, dynamic
scheduling, and dynamic transformation.
We present the design of an adaptive skeleton library, with a task
graph implementation, JIT trace costing, and adaptive transformations.
We outline the architecture of the protoype adaptive skeleton
execution framework in Pycket, describing tasks, serialisation,
and the current scheduler.We report a preliminary evaluation of the
prototype framework using 4 micro-benchmarks and a small case
study on two NUMA servers (24 and 96 cores) and a small cluster
(17 hosts, 272 cores). Key results include Pycket delivering good
sequential performance e.g. almost as fast as C for some benchmarks;
good absolute speedups on all architectures (up to 120 on
128 cores for sumEuler); and that the adaptive transformations do
improve performance
Shape-based cost analysis of skeletal parallel programs
Institute for Computing Systems ArchitectureThis work presents an automatic cost-analysis system for an implicitly parallel skeletal
programming language.
Although deducing interesting dynamic characteristics of parallel programs (and in
particular, run time) is well known to be an intractable problem in the general case, it
can be alleviated by placing restrictions upon the programs which can be expressed.
By combining two research threads, the “skeletal” and “shapely” paradigms which
take this route, we produce a completely automated, computation and communication
sensitive cost analysis system. This builds on earlier work in the area by quantifying
communication as well as computation costs, with the former being derived for the
Bulk Synchronous Parallel (BSP) model.
We present details of our shapely skeletal language and its BSP implementation strategy
together with an account of the analysis mechanism by which program behaviour
information (such as shape and cost) is statically deduced. This information can be
used at compile-time to optimise a BSP implementation and to analyse computation
and communication costs. The analysis has been implemented in Haskell. We consider
different algorithms expressed in our language for some example problems and
illustrate each BSP implementation, contrasting the analysis of their efficiency by traditional,
intuitive methods with that achieved by our cost calculator. The accuracy of
cost predictions by our cost calculator against the run time of real parallel programs is
tested experimentally.
Previous shape-based cost analysis required all elements of a vector (our nestable bulk
data structure) to have the same shape. We partially relax this strict requirement on data
structure regularity by introducing new shape expressions in our analysis framework.
We demonstrate that this allows us to achieve the first automated analysis of a complete
derivation, the well known maximum segment sum algorithm of Skillicorn and Cai
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to
conventional deep neural networks at a fraction of the cost in terms of memory
and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully
digital configurable hardware accelerator IP for BNNs, integrated within a
microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid
SRAM / standard cell memory. The XNE is able to fully compute convolutional and
dense layers in autonomy or in cooperation with the core in the MCU to realize
more complex behaviors. We show post-synthesis results in 65nm and 22nm
technology for the XNE IP and post-layout results in 22nm for the full MCU
indicating that this system can drop the energy cost per binary operation to
21.6fJ per operation at 0.4V, and at the same time is flexible and performant
enough to execute state-of-the-art BNN topologies such as ResNet-34 in less
than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation
at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design
of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu
Energy-Efficient Algorithms
We initiate the systematic study of the energy complexity of algorithms (in
addition to time and space complexity) based on Landauer's Principle in
physics, which gives a lower bound on the amount of energy a system must
dissipate if it destroys information. We propose energy-aware variations of
three standard models of computation: circuit RAM, word RAM, and
transdichotomous RAM. On top of these models, we build familiar high-level
primitives such as control logic, memory allocation, and garbage collection
with zero energy complexity and only constant-factor overheads in space and
time complexity, enabling simple expression of energy-efficient algorithms. We
analyze several classic algorithms in our models and develop low-energy
variations: comparison sort, insertion sort, counting sort, breadth-first
search, Bellman-Ford, Floyd-Warshall, matrix all-pairs shortest paths, AVL
trees, binary heaps, and dynamic arrays. We explore the time/space/energy
trade-off and develop several general techniques for analyzing algorithms and
reducing their energy complexity. These results lay a theoretical foundation
for a new field of semi-reversible computing and provide a new framework for
the investigation of algorithms.Comment: 40 pages, 8 pdf figures, full version of work published in ITCS 201
Study protocol for an evaluation of the effectiveness of ‘care bundles’ as a means of improving hospital care and reducing hospital readmission for patients with chronic obstructive pulmonary disease (COPD)
BACKGROUND: Chronic Obstructive Pulmonary Disease is one of the commonest respiratory diseases in the United Kingdom, accounting for 10 % of unplanned hospital admissions each year. Nearly a third of these admitted patients are re-admitted to hospital within 28 days of discharge. Whilst there is a move within the NHS to ensure that people with long-term conditions receive more co-ordinated care, there is little research evidence to support an optimum approach to this in COPD. This study aims to evaluate the effectiveness of introducing standardised packages of care i.e. care bundles, for patients with acute exacerbations of COPD as a means of improving hospital care and reducing re-admissions. METHODS / DESIGN: This mixed-methods evaluation will use a controlled before-and-after design to examine the effect of, and costs associated with, implementing care bundles for patients admitted to hospital with an acute exacerbation of COPD, compared with usual care. It will quantitatively measure a range of patient and organisational outcomes for two groups of hospitals - those who deliver care using COPD care bundles, and those who deliver care without the use of COPD care bundles. These care bundles may be provided for patients with COPD following admission, prior to discharge or at both points in the care pathway. The primary outcome will be re-admission to hospital within 28 days of discharge, although the study will additionally investigate a number of secondary outcomes including length of stay, total bed days, in-hospital mortality, costs of care and patient / carer experience. A series of nested qualitative case studies will explore in detail the context and process of care as well as the impact of COPD bundles on staff, patients and carers. DISCUSSION: The results of the study will provide information about the effectiveness of care bundles as a way of managing in-hospital care for patients with an acute exacerbation of COPD. Given the number of unplanned hospital admissions for this patient group and their rate of subsequent re-admission, it is hoped that this evaluation will make a timely contribution to the evidence on care provision, to the benefit of patients, clinicians, managers and policy-makers. TRIAL REGISTRATION: International Standard Randomised Controlled Trials – ISRCTN13022442 - 11 February 201
Redox Bulk Energy Storage System Study, Volume 2
For abstract, see N77-33608
- …