7 research outputs found
Compressive Sensing Using Iterative Hard Thresholding with Low Precision Data Representation: Theory and Applications
Modern scientific instruments produce vast amounts of data, which can
overwhelm the processing ability of computer systems. Lossy compression of data
is an intriguing solution, but comes with its own drawbacks, such as potential
signal loss, and the need for careful optimization of the compression ratio. In
this work, we focus on a setting where this problem is especially acute:
compressive sensing frameworks for interferometry and medical imaging. We ask
the following question: can the precision of the data representation be lowered
for all inputs, with recovery guarantees and practical performance? Our first
contribution is a theoretical analysis of the normalized Iterative Hard
Thresholding (IHT) algorithm when all input data, meaning both the measurement
matrix and the observation vector are quantized aggressively. We present a
variant of low precision normalized {IHT} that, under mild conditions, can
still provide recovery guarantees. The second contribution is the application
of our quantization framework to radio astronomy and magnetic resonance
imaging. We show that lowering the precision of the data can significantly
accelerate image recovery. We evaluate our approach on telescope data and
samples of brain images using CPU and FPGA implementations achieving up to a 9x
speed-up with negligible loss of recovery quality.Comment: 19 pages, 5 figures, 1 table, in IEEE Transactions on Signal
Processin
Spiral in Scala: Towards the Systematic Construction of Generators for Performance Libraries
Program generators for high performance libraries are an appealing solution to the recurring problem of porting and optimizing code with every new processor generation, but only few such generators exist to date. This is due to not only the difficulty of the design, but also of the actual implementation, which often results in an ad-hoc collection of standalone programs and scripts that are hard to extend, maintain, or reuse. In this paper we ask whether and which programming language concepts and features are needed to enable a more systematic construction of such generators. The systematic approach we advocate extrapolates from existing generators: a) describing the problem and algorithmic knowledge using one, or several, domain-specific languages (DSLs), b) expressing optimizations and choices as rewrite rules on DSL programs, c) designing data structures that can be configured to control the type of code that is generated and the data representation used, and d) using autotuning to select the best-performing alternative. As a case study, we implement a small, but representative subset of Spiral in Scala using the Lightweight Modular Staging (LMS) framework. The first main contribution of this paper is the realization of c) using type classes to abstract over staging decisions, i.e. which pieces of a computation are performed immediately and for which pieces code is generated. Specifically, we abstract over different complex data representations jointly with different code representations including generating loops versus unrolled code with scalar replacement-a crucial and usually tedious performance transformation. The second main contribution is to provide full support for a) and d) within the LMS framework: we extend LMS to support translation between different DSLs and autotuning through search
Fast quantized arithmetic on x86: Trading compute for data movement
We introduce Clover, a new library for efficient computation using low-precision data, providing mathematical routines required by fundamental methods in optimization and sparse recovery. Our library faithfully implements variants of stochastic quantization that guarantee convergence at low precision, and supports data formats from 4-bit quantized to 32-bit IEEE-754 on current Intel processors. In particular, we show that 4-bit can be implemented efficiently using Intel AVX despite the lack of native support for this data format. Experimental results with dot product, matrix-vector multiplication (MVM), gradient descent (GD), and iterative hard thresholding (IHT) demonstrate that the attainable speedups are in many cases close to linear with respect to the reduction of precision due to reduced data movement. Finally, for GD and IHT, we show examples of absolute speedup achieved by 4-bit versus 32-bit, by iterating until a given target error is achieved
Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems
Most performance critical software is developed using very low-level techniques. We argue that this needs to change, and that generative programming is an effective avenue to enable the use of high-level languages and programming techniques in many such circumstances.ISSN:1868-896