154 research outputs found

    Strategies for neural networks in ballistocardiography with a view towards hardware implementation

    Get PDF
    A thesis submitted for the degree of Doctor of Philosophy at the University of LutonThe work described in this thesis is based on the results of a clinical trial conducted by the research team at the Medical Informatics Unit of the University of Cambridge, which show that the Ballistocardiogram (BCG) has prognostic value in detecting impaired left ventricular function before it becomes clinically overt as myocardial infarction leading to sudden death. The objective of this study is to develop and demonstrate a framework for realising an on-line BCG signal classification model in a portable device that would have the potential to find pathological signs as early as possible for home health care. Two new on-line automatic BeG classification models for time domain BeG classification are proposed. Both systems are based on a two stage process: input feature extraction followed by a neural classifier. One system uses a principal component analysis neural network, and the other a discrete wavelet transform, to reduce the input dimensionality. Results of the classification, dimensionality reduction, and comparison are presented. It is indicated that the combined wavelet transform and MLP system has a more reliable performance than the combined neural networks system, in situations where the data available to determine the network parameters is limited. Moreover, the wavelet transfonn requires no prior knowledge of the statistical distribution of data samples and the computation complexity and training time are reduced. Overall, a methodology for realising an automatic BeG classification system for a portable instrument is presented. A fully paralJel neural network design for a low cost platform using field programmable gate arrays (Xilinx's XC4000 series) is explored. This addresses the potential speed requirements in the biomedical signal processing field. It also demonstrates a flexible hardware design approach so that an instrument's parameters can be updated as data expands with time. To reduce the hardware design complexity and to increase the system performance, a hybrid learning algorithm using random optimisation and the backpropagation rule is developed to achieve an efficient weight update mechanism in low weight precision learning. The simulation results show that the hybrid learning algorithm is effective in solving the network paralysis problem and the convergence is much faster than by the standard backpropagation rule. The hidden and output layer nodes have been mapped on Xilinx FPGAs with automatic placement and routing tools. The static time analysis results suggests that the proposed network implementation could generate 2.7 billion connections per second performance

    Loop Transformations for the Optimized Generation of Reconfigurable Hardware

    Get PDF
    Current high-level design environments offer little support to implement data-intensive applications on heterogeneous-memory systems; they rather focus on parallelism. This thesis addresses the memory hierarchy problem to high-level transformations of loop structures. The composition of long transformation sequences by combining shorter subsequences is studied together with the influence of the order of applying transformation steps. Several methods are presented to estimate bounds on Ehrhart quasi-polynomials, which can be used to statically evaluate program properties, such as memory usage. Since loop transformations not only influence the data access pattern but also the control complexity we present a hardware loop controller architecture which supports hardware generation from the polyhedral representation used for loop transformations. The techniques are demonstrated by the semi-automatic generation of an FPGA implementation of an inverse discrete wavelet transform

    Prediction-based incremental refinement for binomially-factorized discrete wavelet transforms

    Get PDF
    It was proposed recently that quantized representations of the input source (e. g., images, video) can be used for the computation of the two-dimensional discrete wavelet transform (2D DWT) incrementally. The coarsely quantized input source is used for the initial computation of the forward or inverse DWT, and the result is successively refined with each new refinement of the source description via an embedded quantizer. This computation is based on the direct two-dimensional factorization of the DWT using the generalized spatial combinative lifting algorithm. In this correspondence, we investigate the use of prediction for the computation of the results, i.e., exploiting the correlation of neighboring input samples (or transform coefficients) in order to reduce the dynamic range of the required computations, and thereby reduce the circuit activity required for the arithmetic operations of the forward or inverse transform. We focus on binomial factorizations of DWTs that include (amongst others) the popular 9/7 filter pair. Based on an FPGA arithmetic co-processor testbed, we present energy-consumption results for the arithmetic operations of incremental refinement and prediction-based incremental refinement in comparison to the conventional (nonrefinable) computation. Our tests with combinations of intra and error frames of video sequences show that the former can be 70% more energy efficient than the latter for computing to half precision and remains 15% more efficient for full-precision computation

    A Vlsi architecture for lifting-based wavelet packet transform in fingerprint image compression

    Full text link
    FBI uses a technique called Wavelet Scalar Quantization (WSQ), a wavelet packet transform (WPT) based method, to compress its fingerprint images. Though many VLSI architectures have been proposed for wavelet transform in the literature, it is not the case for the WPT. In this thesis, a VLSI architecture capable of computing the WPT is presented for application of WSQ. In the proposed architecture, Lifting Scheme (LS) is used to generate wavelets instead of the traditional convolution filter-bank (FB) specified in original standard. A comparative study between LS and FB shows that quality of images transformed by LS is completely acceptable (with 30dB∼40dB PSNR at a target bit rate of 0.75dpp) while fewer operations required. In particular, to compare with FB, the hardware consumption, for our WSQ application, is reduced to half due to the LS. Moreover, this architecture can be easily configured to compute any required WPT application

    Reconfigurable Computing for Space

    Get PDF

    ASC: A stream compiler for computing with FPGAs

    No full text
    Published versio

    Dynamically Reconfigurable Systolic Array Accelerators: A Case Study with Extended Kalman Filter and Discrete Wavelet Transform Algorithms

    Get PDF
    Field programmable grid arrays (FPGA) are increasingly being adopted as the primary on-board computing system for autonomous deep space vehicles. There is a need to support several complex applications for navigation and image processing in a rapidly responsive on-board FPGA-based computer. This requires exploring and combining several design concepts such as systolic arrays, hardware-software partitioning, and partial dynamic reconfiguration. A microprocessor/co-processor design that can accelerate two single precision oating-point algorithms, extended Kalman lter and a discrete wavelet transform, is presented. This research makes three key contributions. (i) A polymorphic systolic array framework comprising of recofigurable partial region-based sockets to accelerate algorithms amenable to being mapped onto linear systolic arrays. When implemented on a low end Xilinx Virtex4 SX35 FPGA the design provides a speedup of at least 4.18x and 6.61x over a state of the art microprocessor used in spacecraft systems for the extended Kalman lter and discrete wavelet transform algorithms, respectively. (ii) Switchboxes to enable communication between static and partial reconfigurable regions and a simple protocol to enable schedule changes when a socket\u27s contents are dynamically reconfigured to alter the concurrency of the participating systolic arrays. (iii) A hybrid partial dynamic reconfiguration method that combines Xilinx early access partial reconfiguration, on-chip bitstream decompression, and bitstream relocation to enable fast scaling of systolic arrays on the PolySAF. This technique provided a 2.7x improvement in reconfiguration time compared to an o-chip partial reconfiguration technique that used a Flash card on the FPGA board, and a 44% improvement in BRAM usage compared to not using compression

    Design and application of reconfigurable circuits and systems

    No full text
    Open Acces
    corecore