Search CORE

7,457 research outputs found

Prediction-based incremental refinement for binomially-factorized discrete wavelet transforms

Author: Andreopoulos Y
Demosthenous A
Jiang D
Publication venue: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication date: 28/04/2010
Field of study

It was proposed recently that quantized representations of the input source (e. g., images, video) can be used for the computation of the two-dimensional discrete wavelet transform (2D DWT) incrementally. The coarsely quantized input source is used for the initial computation of the forward or inverse DWT, and the result is successively refined with each new refinement of the source description via an embedded quantizer. This computation is based on the direct two-dimensional factorization of the DWT using the generalized spatial combinative lifting algorithm. In this correspondence, we investigate the use of prediction for the computation of the results, i.e., exploiting the correlation of neighboring input samples (or transform coefficients) in order to reduce the dynamic range of the required computations, and thereby reduce the circuit activity required for the arithmetic operations of the forward or inverse transform. We focus on binomial factorizations of DWTs that include (amongst others) the popular 9/7 filter pair. Based on an FPGA arithmetic co-processor testbed, we present energy-consumption results for the arithmetic operations of incremental refinement and prediction-based incremental refinement in comparison to the conventional (nonrefinable) computation. Our tests with combinations of intra and error frames of video sequences show that the former can be 70% more energy efficient than the latter for computing to half precision and remains 15% more efficient for full-precision computation

Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems

Author: Anastasia Davide
Andreopoulos Yiannis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/10/2011
Field of study

The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the imprecision (distortion) of computation. Our technique employs adaptive scalar companding and rounding to input matrix blocks followed by two forms of packing in floating-point that allow for concurrent calculation of multiple results. Since the adaptive companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding increase in distortion) depends on the input data statistics. To demonstrate this, we derive the optimal throughput-distortion control framework for GEMM for the broad class of zero-mean, independent identically distributed, input sources. Our approach converts matrix multiplication in programmable processors into a computation channel: when increasing the processing throughput, the output noise (error) increases due to (i) coarser quantization and (ii) computational errors caused by exceeding the machine-precision limitations. We show that, under certain distortion in the GEMM computation, the proposed framework can significantly surpass 100% of the peak performance of a given processor. The practical benefits of our proposal are shown in a face recognition system and a multi-layer perceptron system trained for metadata learning from a large music feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012

arXiv.org e-Print Archive

Software-based Approximate Computation Of Signal Processing Tasks

Author: Anastasia D
Publication venue: UCL (University College London)
Publication date: 28/10/2012
Field of study

This thesis introduces a new dimension in performance scaling of signal processing systems by proposing software frameworks that achieve increased processing throughput when producing approximate results. The first contribution of this work is a new theory for accelerated computation of multimedia processing based on the concept of tight packing (Chapter 2). Usage of this theory accelerates small-dynamic-range linear signal processing tasks (such as convolution and transform decomposition) that map integers to integers, without incurring any accuracy loss. The concept of tight packing is combined with incremental computation that processes inputs in a bitplane-by-bitplane manner (Chapter 3), thereby leading to substantial throughput/distortion scalability within filtering, transform-decomposition and motion-estimation tasks. This framework also provides for region-of-interest computation and has inherent robustness to arbitrary termination of processing, imposed, for example, by a task scheduler. Finally, the concept of packed processing is extended to floating-point (lossy) matrix computations, with particular focus on the generic matrix multiplication (GEMM) routine of BLAS-3 (Chapters 4 and 5). This routine is a fundamental building block for several linear algebra and digital signal processing systems, such as face recognition and neural-network training for metadata-based retrieval systems. In order to compete with the best-performing software designs for GEMM, an implementation using single instruction, multiple data (SIMD) instructions is presented and analyzed. The proposed approach demonstrates substantial performance scaling in practice; specifically, it is shown to achieve up to twice the processing throughput of the best designs for GEMM when producing approximate results (under the same hardware). In summary, the proposed approximate computation of signal processing tasks can be selectively disabled thereby producing conventional full-precision/lower-throughput processing when deemed necessary. Importantly, the proposed software designs run on off-the-shelf computer hardware and provide for on-demand reconfiguration, depending on the input data and the precision specification (from full precision to noisy computation). Thus, the proposed approximate computation framework allows for backward compatibility and can be offered as an add-on service, creating significant competitive advantages for application developers. It can be used in mobile or high-performance computing systems when the precision of computation is not of critical importance (error-tolerant systems), or when the input data is intrinsically noisy

Autonomous flight and remote site landing guidance research for helicopters

Author: Denton R. V.
Pecklesma N. J.
Smith F. W.
Publication venue
Publication date
Field of study

Automated low-altitude flight and landing in remote areas within a civilian environment are investigated, where initial cost, ongoing maintenance costs, and system productivity are important considerations. An approach has been taken which has: (1) utilized those technologies developed for military applications which are directly transferable to a civilian mission; (2) exploited and developed technology areas where new methods or concepts are required; and (3) undertaken research with the potential to lead to innovative methods or concepts required to achieve a manual and fully automatic remote area low-altitude and landing capability. The project has resulted in a definition of system operational concept that includes a sensor subsystem, a sensor fusion/feature extraction capability, and a guidance and control law concept. These subsystem concepts have been developed to sufficient depth to enable further exploration within the NASA simulation environment, and to support programs leading to the flight test

Error tolerant multimedia stream processing: There's plenty of room at the top (of the system stack)

Author: Andreopoulos Y
Publication venue
Publication date: 08/12/2012
Field of study

There is a growing realization that the expected fault rates and energy dissipation stemming from increases in CMOS integration will lead to the abandonment of traditional system reliability in favor of approaches that offer reliability to hardware-induced errors across the application, runtime support, architecture, device and integrated-circuit (IC) layers. Commercial stakeholders of multimedia stream processing (MSP) applications, such as information retrieval, stream mining systems, and high-throughput image and video processing systems already feel the strain of inadequate system-level scaling and robustness under the always-increasing user demand. While such applications can tolerate certain imprecision in their results, today's MSP systems do not support a systematic way to exploit this aspect for cross-layer system resilience. However, research is currently emerging that attempts to utilize the error-tolerant nature of MSP applications for this purpose. This is achieved by modifications to all layers of the system stack, from algorithms and software to the architecture and device layer, and even the IC digital logic synthesis itself. Unlike conventional processing that aims for worst-case performance and accuracy guarantees, error-tolerant MSP attempts to provide guarantees for the expected performance and accuracy. In this paper we review recent advances in this field from an MSP and a system (layer-by-layer) perspective, and attempt to foresee some of the components of future cross-layer error-tolerant system design that may influence the multimedia and the general computing landscape within the next ten years. © 1999-2012 IEEE

Custom Integrated Circuits

Author: Allen Jonathan
Armstrong Robert C.
Baltus Donald G.
Bamji Cyrus S.
Deroo John
Jeong Hong
Kundert K.
Lumsdaine Andrew
McCormick Lynne M.
McCormick Steven P.
Miyanaga Hiroshi
Musicus Bruce R.
Nabors Keith S.
O'Brien Peter
O'Conner Kevin
Prasanna G. N. Srinivasa
Reichelt Mark W.
Schembor Edward
Song William S.
Standley David L.
Van Aelten Filip J.
White Jacob K.
Wyatt John L., Jr.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains reports on nine research projects.Analog Devices, Inc.International Business Machines, Inc.Joint Services Electronics Program (Contract DAALO03-86-K-0002)U.S. Air Force - Office of Scientific Research (Grant AFOSR 86-0164)Rockwell International CorporationOKI SemiconductorU.S. Navy - Office of Naval Research (Contract N00014-81-K-0742)Charles Stark Draper LaboratoryDARPA/U.S. Navy - Office of Naval Research (Contract N00014-80-C-0622)DARPA/U.S. Navy - Office of Naval Research (Contract N00014-87-K-0825)National Science Foundation (Grant ECS-83-10941)AT&T Bell Laboratorie

Taming Multi-core Parallelism with Concurrent Mixin Layers

Author: Archuleta Jeremy
Scogland Tom
Tilevich Eli
Publication venue
Publication date: 01/01/2008
Field of study

The recent shift in computer system design to multi-core technology requires that the developer leverage explicit parallel programming techniques in order to utilize available performance. Nevertheless, developing the requisite parallel applications remains a prohibitively-difficult undertaking, particularly for the general programmer. To mitigate many of the challenges in creating concurrent software, this paper introduces a new parallel programming methodology that leverages feature-oriented programming (FOP) to logically decompose a product line architecture (PLA) into concurrent execution units. In addition, our efficient implementation of this methodology, that we call concurrent mixin layers, uses a layered architecture to facilitate the development of parallel applications. To validate our methodology and accompanying implementation, we present a case study of a product line of multimedia applications deployed within a typical multi-core environment. Our performance results demonstrate that a product line can be effectively transformed into parallel applications capable of utilizing multiple cores, thus improving performance. Furthermore, concurrent mixin layers significantly reduces the complexity of parallel programming by eliminating the need for the programmer to introduce explicit low-level concurrency control. Our initial experience gives us reason to believe that concurrent mixin layers is a promising technique for taming parallelism in multi-core environments

Computer Science Technical Reports @Virginia Tech

Failure Mitigation in Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

Author: Anam Mohammad Ashraful
Andreopoulos Yiannis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2015
Field of study

A new roll-forward technique is proposed that recovers from any single fail-stop failure in

M

integer data streams (

M\geq3

) when undergoing linear, sesquilinear or bijective (LSB) operations, such as: scaling, additions/subtractions, inner or outer vector products and permutations. In the proposed approach, the

M

input integer data streams are linearly superimposed to form

M

numerically entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The output results can be extracted from any

M-1

entangled output streams by additions and arithmetic shifts, thereby guaranteeing robustness to a fail-stop failure in any single stream computation. Importantly, unlike other methods, the number of operations required for the entanglement, extraction and recovery of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via convolution operations. Our analysis and experiments reveal that the proposed approach incurs only

1.8\%

2.8\%

reduction in processing throughput in comparison to the failure-intolerant approach. This overhead is 9 to 14 times smaller than that of the equivalent checksum-based method. Thus, our proposal can be used in distributed systems and unreliable processor hardware, or safety-critical applications, where robustness against fail-stop failures becomes a necessity.Comment: Proc. 21st IEEE International On-Line Testing Symposium (IOLTS 2015), July 2015, Halkidiki, Greec

arXiv.org e-Print Archive

Proceedings of the Fifteenth International Workshop on Component-Oriented Programming (WCOP) 2010. 22nd June 2010, Praha, Czech Republic

Author: Bühnová Barbora
Reussner Ralf
Szyperski Clemens
Weck Wolfgang
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2010
Field of study