52,502 research outputs found

    Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

    Full text link
    State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution. Such networks strain the computational capabilities and energy available to embedded and mobile processing platforms, restricting their use in many important applications. In this paper, we push the boundaries of hardware-effective CNN design by proposing BCNN with Separable Filters (BCNNw/SF), which applies Singular Value Decomposition (SVD) on BCNN kernels to further reduce computational and storage complexity. To enable its implementation, we provide a closed form of the gradient over SVD to calculate the exact gradient with respect to every binarized weight in backward propagation. We verify BCNNw/SF on the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator for CIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of 17% and execution time reduction of 31.3% compared to BCNN with only minor accuracy sacrifices.Comment: 9 pages, 6 figures, accepted for Embedded Vision Workshop (CVPRW

    A FPGA system for QRS complex detection based on Integer Wavelet Transform

    Get PDF
    Due to complexity of their mathematical computation, many QRS detectors are implemented in software and cannot operate in real time. The paper presents a real-time hardware based solution for this task. To filter ECG signal and to extract QRS complex it employs the Integer Wavelet Transform. The system includes several components and is incorporated in a single FPGA chip what makes it suitable for direct embedding in medical instruments or wearable health care devices. It has sufficient accuracy (about 95%), showing remarkable noise immunity and low cost. Additionally, each system component is composed of several identical blocks/cells what makes the design highly generic. The capacity of today existing FPGAs allows even dozens of detectors to be placed in a single chip. After the theoretical introduction of wavelets and the review of their application in QRS detection, it will be shown how some basic wavelets can be optimized for easy hardware implementation. For this purpose the migration to the integer arithmetic and additional simplifications in calculations has to be done. Further, the system architecture will be presented with the demonstrations in both, software simulation and real testing. At the end, the working performances and preliminary results will be outlined and discussed. The same principle can be applied with other signals where the hardware implementation of wavelet transform can be of benefit

    A single chip system for ECG feature extraction

    Get PDF

    Addressing performance requirements in the FDT-based design of distributed systems

    Get PDF
    The development of distributed systems is generally regarded as a complex and costly task, and for this reason formal description techniques such as LOTOS and ESTELLE (both standardized by the ISO) are increasingly used in this process. Our experience is that LOTOS can be exploited at many stages on the design trajectory, from requirements specification to implementation, but that the language elements do not allow direct formalization of performance requirements. To avoid duplication of effort by using two formalisms with distinct approaches, we propose a design method that incorporates performance constraints in an heuristic but effective manner
    • …
    corecore