20 research outputs found

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Review of Rounding Based Approximate Multiplier (ROBA) For Digital Signal Processing

    Get PDF
    The fundamental idea of adjusting put together estimated multiplier depends with respect to adjusting of numbers. This multiplier can be connected for both marked and unsigned numbers. In this paper contemplated an Rounding Based Approximate Multiplier that is fast yet vitality effective. The methodology is to round the operands to the closest example of two. Along these lines the computational concentrated piece of the augmentation is excluded improving rate and vitality utilization at the cost of a little mistake. This methodology is appropriate to both marked and unsigned augmentations. The productivity of the ROBA multiplier is assessed by contrasting its execution and those of some rough and precise multipliers utilizing distinctive plan parameters

    Design and Implementation of Complexity Reduced Digital Signal Processors for Low Power Biomedical Applications

    Get PDF
    Wearable health monitoring systems can provide remote care with supervised, inde-pendent living which are capable of signal sensing, acquisition, local processing and transmission. A generic biopotential signal (such as Electrocardiogram (ECG), and Electroencephalogram (EEG)) processing platform consists of four main functional components. The signals acquired by the electrodes are ampliļ¬ed and preconditioned by the (1) Analog-Front-End (AFE) which are then digitized via the (2) Analog-to-Digital Converter (ADC) for further processing. The local digital signal processing is usually handled by a custom designed (3) Digital Signal Processor (DSP) which is responsible for either anyone or combination of signal processing algorithms such as noise detection, noise/artefact removal, feature extraction, classiļ¬cation and compres-sion. The digitally processed data is then transmitted via the (4) transmitter which is renown as the most power hungry block in the complete platform. All the afore-mentioned components of the wearable systems are required to be designed and ļ¬tted into an integrated system where the area and the power requirements are stringent. Therefore, hardware complexity and power dissipation of each functional component are crucial aspects while designing and implementing a wearable monitoring platform. The work undertaken focuses on reducing the hardware complexity of a biosignal DSP and presents low hardware complexity solutions that can be employed in the aforemen-tioned wearable platforms. A typical state-of-the-art system utilizes Sigma Delta (Ī£āˆ†) ADCs incorporating a Ī£āˆ† modulator and a decimation ļ¬lter whereas the state-of-the-art decimation ļ¬lters employ linear phase Finite-Impulse-Response (FIR) ļ¬lters with high orders that in-crease the hardware complexity [1ā€“5]. In this thesis, the novel use of minimum phase Inļ¬nite-Impulse-Response (IIR) decimators is proposed where the hardware complexity is massively reduced compared to the conventional FIR decimators. In addition, the non-linear phase eļ¬€ects of these ļ¬lters are also investigated since phase non-linearity may distort the time domain representation of the signal being ļ¬ltered which is un-desirable eļ¬€ect for biopotential signals especially when the ļ¬ducial characteristics carry diagnostic importance. In the case of ECG monitoring systems the eļ¬€ect of the IIR ļ¬lter phase non-linearity is minimal which does not aļ¬€ect the diagnostic accuracy of the signals. The work undertaken also proposes two methods for reducing the hardware complexity of the popular biosignal processing tool, Discrete Wavelet Transform (DWT). General purpose multipliers are known to be hardware and power hungry in terms of the number of addition operations or their underlying building blocks like full adders or half adders required. Higher number of adders leads to an increase in the power consumption which is directly proportional to the clock frequency, supply voltage, switching activity and the resources utilized. A typical Field-Programmable-Gate-Arrayā€™s (FPGA) resources are Look-up Tables (LUTs) whereas a custom Digital Signal Processorā€™s (DSP) are gate-level cells of standard cell libraries that are used to build adders [6]. One of the proposed methods is the replacement of the hardware and power hungry general pur-pose multipliers and the coeļ¬ƒcient memories with reconļ¬gurable multiplier blocks that are composed of simple shift-add networks and multiplexers. This method substantially reduces the resource utilization as well as the power consumption of the system. The second proposed method is the design and implementation of the DWT ļ¬lter banks using IIR ļ¬lters which employ less number of arithmetic operations compared to the state-of-the-art FIR wavelets. This reduces the hardware complexity of the analysis ļ¬lter bank of the DWT and can be employed in applications where the reconstruction is not required. However, the synthesis ļ¬lter bank for the IIR wavelet transform has a higher computational complexity compared to the conventional FIR wavelet synthesis ļ¬lter banks since re-indexing of the ļ¬ltered data sequence is required that can only be achieved via the use of extra registers. Therefore, this led to the proposal of a novel design which replaces the complex IIR based synthesis ļ¬lter banks with FIR ļ¬l-ters which are the approximations of the associated IIR ļ¬lters. Finally, a comparative study is presented where the hybrid IIR/FIR and FIR/FIR wavelet ļ¬lter banks are de-ployed in a typical noise reduction scenario using the wavelet thresholding techniques. It is concluded that the proposed hybrid IIR/FIR wavelet ļ¬lter banks provide better denoising performance, reduced computational complexity and power consumption in comparison to their IIR/IIR and FIR/FIR counterparts

    Techniques for Efficient Implementation of FIR and Particle Filtering

    Full text link

    System on fabrics utilising distributed computing

    Get PDF
    The main vision of wearable computing is to make electronic systems an important part of everyday clothing in the future which will serve as intelligent personal assistants. Wearable devices have the potential to be wearable computers and not mere input/output devices for the human body. The present thesis focuses on introducing a new wearable computing paradigm, where the processing elements are closely coupled with the sensors that are distributed using Instruction Systolic Array (ISA) architecture. The thesis describes a novel, multiple sensor, multiple processor system architecture prototype based on the Instruction Systolic Array paradigm for distributed computing on fabrics. The thesis introduces new programming model to implement the distributed computer on fabrics. The implementation of the concept has been validated using parallel algorithms. A real-time shape sensing and reconstruction application has been implemented on this architecture and has demonstrated a physical design for a wearable system based on the ISA concept constructed from off-the-shelf microcontrollers and sensors. Results demonstrate that the real time application executes on the prototype ISA implementation thus confirming the viability of the proposed architecture for fabric-resident computing devices

    Intelligent Sensor Networks

    Get PDF
    In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Compiler techniques for scalable performance of stream programs on multicore architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 211-222).Given the ubiquity of multicore processors, there is an acute need to enable the development of scalable parallel applications without unduly burdening programmers. Currently, programmers are asked not only to explicitly expose parallelism but also concern themselves with issues of granularity, load-balancing, synchronization, and communication. This thesis demonstrates that when algorithmic parallelism is expressed in the form of a stream program, a compiler can effectively and automatically manage the parallelism. Our compiler assumes responsibility for low-level architectural details, transforming implicit algorithmic parallelism into a mapping that achieves scalable parallel performance for a given multicore target. Stream programming is characterized by regular processing of sequences of data, and it is a natural expression of algorithms in the areas of audio, video, digital signal processing, networking, and encryption. Streaming computation is represented as a graph of independent computation nodes that communicate explicitly over data channels. Our techniques operate on contiguous regions of the stream graph where the input and output rates of the nodes are statically determinable. Within a static region, the compiler first automatically adjusts the granularity and then exploits data, task, and pipeline parallelism in a holistic fashion. We introduce techniques that data-parallelize nodes that operate on overlapping sliding windows of their input, translating serializing state into minimal and parametrized inter-core communication. Finally, for nodes that cannot be data-parallelized due to state, we are the first to automatically apply software-pipelining techniques at a coarse granularity to exploit pipeline parallelism between stateful nodes. Our framework is evaluated in the context of the StreamIt programming language. StreamIt is a high-level stream programming language that has been shown to improve programmer productivity in implementing streaming algorithms. We employ the StreamIt Core benchmark suite of 12 real-world applications to demonstrate the effectiveness of our techniques for varying multicore architectures. For a 16-core distributed memory multicore, we achieve a 14.9x mean speedup. For benchmarks that include sliding-window computation, our sliding-window data-parallelization techniques are required to enable scalable performance for a 16-core SMP multicore (14x mean speedup) and a 64-core distributed shared memory multicore (52x mean speedup).by Michael I. Gordon.Ph.D
    corecore