20 research outputs found
Energy efficient hardware acceleration of multimedia processing tools
The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores.
To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature.
The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings
Review of Rounding Based Approximate Multiplier (ROBA) For Digital Signal Processing
The fundamental idea of adjusting put together estimated multiplier depends with respect to adjusting of numbers. This multiplier can be connected for both marked and unsigned numbers. In this paper contemplated an Rounding Based Approximate Multiplier that is fast yet vitality effective. The methodology is to round the operands to the closest example of two. Along these lines the computational concentrated piece of the augmentation is excluded improving rate and vitality utilization at the cost of a little mistake. This methodology is appropriate to both marked and unsigned augmentations. The productivity of the ROBA multiplier is assessed by contrasting its execution and those of some rough and precise multipliers utilizing distinctive plan parameters
Design and Implementation of Complexity Reduced Digital Signal Processors for Low Power Biomedical Applications
Wearable health monitoring systems can provide remote care with supervised, inde-pendent living which are capable of signal sensing, acquisition, local processing and transmission. A generic biopotential signal (such as Electrocardiogram (ECG), and Electroencephalogram (EEG)) processing platform consists of four main functional components. The signals acquired by the electrodes are ampliļ¬ed and preconditioned by the (1) Analog-Front-End (AFE) which are then digitized via the (2) Analog-to-Digital Converter (ADC) for further processing. The local digital signal processing is usually handled by a custom designed (3) Digital Signal Processor (DSP) which is responsible for either anyone or combination of signal processing algorithms such as noise detection, noise/artefact removal, feature extraction, classiļ¬cation and compres-sion. The digitally processed data is then transmitted via the (4) transmitter which is renown as the most power hungry block in the complete platform. All the afore-mentioned components of the wearable systems are required to be designed and ļ¬tted into an integrated system where the area and the power requirements are stringent. Therefore, hardware complexity and power dissipation of each functional component are crucial aspects while designing and implementing a wearable monitoring platform. The work undertaken focuses on reducing the hardware complexity of a biosignal DSP and presents low hardware complexity solutions that can be employed in the aforemen-tioned wearable platforms.
A typical state-of-the-art system utilizes Sigma Delta (Ī£ā) ADCs incorporating a Ī£ā modulator and a decimation ļ¬lter whereas the state-of-the-art decimation ļ¬lters employ linear phase Finite-Impulse-Response (FIR) ļ¬lters with high orders that in-crease the hardware complexity [1ā5]. In this thesis, the novel use of minimum phase Inļ¬nite-Impulse-Response (IIR) decimators is proposed where the hardware complexity is massively reduced compared to the conventional FIR decimators. In addition, the non-linear phase eļ¬ects of these ļ¬lters are also investigated since phase non-linearity may distort the time domain representation of the signal being ļ¬ltered which is un-desirable eļ¬ect for biopotential signals especially when the ļ¬ducial characteristics carry diagnostic importance. In the case of ECG monitoring systems the eļ¬ect of the IIR ļ¬lter phase non-linearity is minimal which does not aļ¬ect the diagnostic accuracy of the signals.
The work undertaken also proposes two methods for reducing the hardware complexity of the popular biosignal processing tool, Discrete Wavelet Transform (DWT). General purpose multipliers are known to be hardware and power hungry in terms of the number of addition operations or their underlying building blocks like full adders or half adders required. Higher number of adders leads to an increase in the power consumption which is directly proportional to the clock frequency, supply voltage, switching activity and the resources utilized. A typical Field-Programmable-Gate-Arrayās (FPGA) resources are Look-up Tables (LUTs) whereas a custom Digital Signal Processorās (DSP) are gate-level cells of standard cell libraries that are used to build adders [6]. One of the proposed methods is the replacement of the hardware and power hungry general pur-pose multipliers and the coeļ¬cient memories with reconļ¬gurable multiplier blocks that are composed of simple shift-add networks and multiplexers. This method substantially reduces the resource utilization as well as the power consumption of the system. The second proposed method is the design and implementation of the DWT ļ¬lter banks using IIR ļ¬lters which employ less number of arithmetic operations compared to the state-of-the-art FIR wavelets. This reduces the hardware complexity of the analysis ļ¬lter bank of the DWT and can be employed in applications where the reconstruction is not required. However, the synthesis ļ¬lter bank for the IIR wavelet transform has a higher computational complexity compared to the conventional FIR wavelet synthesis ļ¬lter banks since re-indexing of the ļ¬ltered data sequence is required that can only be achieved via the use of extra registers. Therefore, this led to the proposal of a novel design which replaces the complex IIR based synthesis ļ¬lter banks with FIR ļ¬l-ters which are the approximations of the associated IIR ļ¬lters. Finally, a comparative study is presented where the hybrid IIR/FIR and FIR/FIR wavelet ļ¬lter banks are de-ployed in a typical noise reduction scenario using the wavelet thresholding techniques. It is concluded that the proposed hybrid IIR/FIR wavelet ļ¬lter banks provide better denoising performance, reduced computational complexity and power consumption in comparison to their IIR/IIR and FIR/FIR counterparts
System on fabrics utilising distributed computing
The main vision of wearable computing is to make electronic systems an important part of everyday clothing in the future which will serve as intelligent personal assistants. Wearable devices have the potential to be wearable computers and not mere input/output devices for the human body. The present thesis focuses on introducing a new wearable computing paradigm, where the processing elements are closely coupled with the sensors that are distributed using Instruction Systolic Array (ISA) architecture.
The thesis describes a novel, multiple sensor, multiple processor system architecture prototype based on the Instruction Systolic Array paradigm for distributed computing on fabrics. The thesis introduces new programming model to implement the distributed computer on fabrics. The implementation of the concept has been validated using parallel algorithms.
A real-time shape sensing and reconstruction application has been implemented on this architecture and has demonstrated a physical design for a wearable system based on the ISA concept constructed from off-the-shelf microcontrollers and sensors. Results demonstrate that the real time application executes on the prototype ISA implementation thus confirming the viability of the proposed architecture for fabric-resident computing devices
Intelligent Sensor Networks
In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts
The Fifth NASA Symposium on VLSI Design
The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design
Recommended from our members
Adaptive Coded Modulation Classification and Spectrum Sensing for Cognitive Radio Systems. Adaptive Coded Modulation Techniques for Cognitive Radio Using Kalman Filter and Interacting Multiple Model Methods
The current and future trends of modern wireless communication systems place heavy demands on fast data transmissions in order to satisfy end usersā requirements anytime, anywhere. Such demands are obvious in recent applications such as smart phones, long term evolution (LTE), 4 & 5 Generations (4G & 5G), and worldwide interoperability for microwave access (WiMAX) platforms, where robust coding and modulations are essential especially in streaming on-line video material, social media and gaming. This eventually resulted in extreme exhaustion imposed on the frequency spectrum as a rare natural resource due to stagnation in current spectrum management policies. Since its advent in the late 1990s, cognitive radio (CR) has been conceived as an enabling technology aiming at the efficient utilisation of frequency spectrum that can lead to potential direct spectrum access (DSA) management. This is mainly attributed to its internal capabilities inherited from the concept of software defined radio (SDR) to sniff its surroundings, learn and adapt its operational parameters accordingly. CR systems (CRs) may commonly comprise one or all of the following core engines that characterise their architectures; namely, adaptive coded modulation (ACM), automatic modulation classification (AMC) and spectrum sensing (SS).
Motivated by the above challenges, this programme of research is primarily aimed at the design and development of new paradigms to help improve the adaptability of CRs and thereby achieve the desirable signal processing tasks at the physical layer of the above core engines. Approximate modelling of Rayleigh and finite state Markov channels (FSMC) with a new concept borrowed from econometric studies have been approached. Then insightful channel estimation by using Kalman filter (KF) augmented with interacting multiple model (IMM) has been examined for the purpose of robust adaptability, which is applied for the first time in wireless communication systems. Such new IMM-KF combination has been facilitated in the feedback channel between wireless transmitter and receiver to adjust the transmitted power, by using a water-filling (WF) technique, and constellation pattern and rate in the ACM algorithm. The AMC has also benefited from such IMM-KF integration to boost the performance against conventional parametric estimation methods such as maximum likelihood estimate (MLE) for channel interrogation and the estimated parameters of both inserted into the ML classification algorithm. Expectation-maximisation (EM) has been applied to examine unknown transmitted modulation sequences and channel parameters in tandem. Finally, the non-parametric multitaper method (MTM) has been thoroughly examined for spectrum estimation (SE) and SS, by relying on Neyman-Pearson (NP) detection principle for hypothesis test, to allow licensed primary users (PUs) to coexist with opportunistic unlicensed secondary users (SUs) in the same frequency bands of interest without harmful effects. The performance of the above newly suggested paradigms have been simulated and assessed under various transmission settings and revealed substantial improvements
Compiler techniques for scalable performance of stream programs on multicore architectures
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 211-222).Given the ubiquity of multicore processors, there is an acute need to enable the development of scalable parallel applications without unduly burdening programmers. Currently, programmers are asked not only to explicitly expose parallelism but also concern themselves with issues of granularity, load-balancing, synchronization, and communication. This thesis demonstrates that when algorithmic parallelism is expressed in the form of a stream program, a compiler can effectively and automatically manage the parallelism. Our compiler assumes responsibility for low-level architectural details, transforming implicit algorithmic parallelism into a mapping that achieves scalable parallel performance for a given multicore target. Stream programming is characterized by regular processing of sequences of data, and it is a natural expression of algorithms in the areas of audio, video, digital signal processing, networking, and encryption. Streaming computation is represented as a graph of independent computation nodes that communicate explicitly over data channels. Our techniques operate on contiguous regions of the stream graph where the input and output rates of the nodes are statically determinable. Within a static region, the compiler first automatically adjusts the granularity and then exploits data, task, and pipeline parallelism in a holistic fashion. We introduce techniques that data-parallelize nodes that operate on overlapping sliding windows of their input, translating serializing state into minimal and parametrized inter-core communication. Finally, for nodes that cannot be data-parallelized due to state, we are the first to automatically apply software-pipelining techniques at a coarse granularity to exploit pipeline parallelism between stateful nodes. Our framework is evaluated in the context of the StreamIt programming language. StreamIt is a high-level stream programming language that has been shown to improve programmer productivity in implementing streaming algorithms. We employ the StreamIt Core benchmark suite of 12 real-world applications to demonstrate the effectiveness of our techniques for varying multicore architectures. For a 16-core distributed memory multicore, we achieve a 14.9x mean speedup. For benchmarks that include sliding-window computation, our sliding-window data-parallelization techniques are required to enable scalable performance for a 16-core SMP multicore (14x mean speedup) and a 64-core distributed shared memory multicore (52x mean speedup).by Michael I. Gordon.Ph.D