173 research outputs found

    YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

    Get PDF
    Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm2^2 and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Resource efficient on-node spike sorting

    Get PDF
    Current implantable brain-machine interfaces are recording multi-neuron activity by utilising multi-channel, multi-electrode micro-electrodes. With the rapid increase in recording capability has come more stringent constraints on implantable system power consumption and size. This is even more so with the increasing demand for wireless systems to increase the number of channels being monitored whilst overcoming the communication bottleneck (in transmitting raw data) via transcutaneous bio-telemetries. For systems observing unit activity, real-time spike sorting within an implantable device offers a unique solution to this problem. However, achieving such data compression prior to transmission via an on-node spike sorting system has several challenges. The inherent complexity of the spike sorting problem arising from various factors (such as signal variability, local field potentials, background and multi-unit activity) have required computationally intensive algorithms (e.g. PCA, wavelet transform, superparamagnetic clustering). Hence spike sorting systems have traditionally been implemented off-line, usually run on work-stations. Owing to their complexity and not-so-well scalability, these algorithms cannot be simply transformed into a resource efficient hardware. On the contrary, although there have been several attempts in implantable hardware, an implementation to match comparable accuracy to off-line within the required power and area requirements for future BMIs have yet to be proposed. Within this context, this research aims to fill in the gaps in the design towards a resource efficient implantable real-time spike sorter which achieves performance comparable to off-line methods. The research covered in this thesis target: 1) Identifying and quantifying the trade-offs on subsequent signal processing performance and hardware resource utilisation of the parameters associated with analogue-front-end. Following the development of a behavioural model of the analogue-front-end and an optimisation tool, the sensitivity of the spike sorting accuracy to different front-end parameters are quantified. 2) Identifying and quantifying the trade-offs associated with a two-stage hybrid solution to realising real-time on-node spike sorting. Initial part of the work focuses from the perspective of template matching only, while the second part of the work considers these parameters from the point of whole system including detection, sorting, and off-line training (template building). A set of minimum requirements are established which ensure robust, accurate and resource efficient operation. 3) Developing new feature extraction and spike sorting algorithms towards highly scalable systems. Based on waveform dynamics of the observed action potentials, a derivative based feature extraction and a spike sorting algorithm are proposed. These are compared with most commonly used methods of spike sorting under varying noise levels using realistic datasets to confirm their merits. The latter is implemented and demonstrated in real-time through an MCU based platform.Open Acces

    Object detection and recognition with event driven cameras

    Get PDF
    This thesis presents study, analysis and implementation of algorithms to perform object detection and recognition using an event-based cam era. This sensor represents a novel paradigm which opens a wide range of possibilities for future developments of computer vision. In partic ular it allows to produce a fast, compressed, illumination invariant output, which can be exploited for robotic tasks, where fast dynamics and signi\ufb01cant illumination changes are frequent. The experiments are carried out on the neuromorphic version of the iCub humanoid platform. The robot is equipped with a novel dual camera setup mounted directly in the robot\u2019s eyes, used to generate data with a moving camera. The motion causes the presence of background clut ter in the event stream. In such scenario the detection problem has been addressed with an at tention mechanism, speci\ufb01cally designed to respond to the presence of objects, while discarding clutter. The proposed implementation takes advantage of the nature of the data to simplify the original proto object saliency model which inspired this work. Successively, the recognition task was \ufb01rst tackled with a feasibility study to demonstrate that the event stream carries su\ufb03cient informa tion to classify objects and then with the implementation of a spiking neural network. The feasibility study provides the proof-of-concept that events are informative enough in the context of object classi\ufb01 cation, whereas the spiking implementation improves the results by employing an architecture speci\ufb01cally designed to process event data. The spiking network was trained with a three-factor local learning rule which overcomes weight transport, update locking and non-locality problem. The presented results prove that both detection and classi\ufb01cation can be carried-out in the target application using the event data

    SYSTEM-ON-A-CHIP (SOC)-BASED HARDWARE ACCELERATION FOR HUMAN ACTION RECOGNITION WITH CORE COMPONENTS

    Get PDF
    Today, the implementation of machine vision algorithms on embedded platforms or in portable systems is growing rapidly due to the demand for machine vision in daily human life. Among the applications of machine vision, human action and activity recognition has become an active research area, and market demand for providing integrated smart security systems is growing rapidly. Among the available approaches, embedded vision is in the top tier; however, current embedded platforms may not be able to fully exploit the potential performance of machine vision algorithms, especially in terms of low power consumption. Complex algorithms can impose immense computation and communication demands, especially action recognition algorithms, which require various stages of preprocessing, processing and machine learning blocks that need to operate concurrently. The market demands embedded platforms that operate with a power consumption of only a few watts. Attempts have been mad to improve the performance of traditional embedded approaches by adding more powerful processors; this solution may solve the computation problem but increases the power consumption. System-on-a-chip eld-programmable gate arrays (SoC-FPGAs) have emerged as a major architecture approach for improving power eciency while increasing computational performance. In a SoC-FPGA, an embedded processor and an FPGA serving as an accelerator are fabricated in the same die to simultaneously improve power consumption and performance. Still, current SoC-FPGA-based vision implementations either shy away from supporting complex and adaptive vision algorithms or operate at very limited resolutions due to the immense communication and computation demands. The aim of this research is to develop a SoC-based hardware acceleration workflow for the realization of advanced vision algorithms. Hardware acceleration can improve performance for highly complex mathematical calculations or repeated functions. The performance of a SoC system can thus be improved by using hardware acceleration method to accelerate the element that incurs the highest performance overhead. The outcome of this research could be used for the implementation of various vision algorithms, such as face recognition, object detection or object tracking, on embedded platforms. The contributions of SoC-based hardware acceleration for hardware-software codesign platforms include the following: (1) development of frameworks for complex human action recognition in both 2D and 3D; (2) realization of a framework with four main implemented IPs, namely, foreground and background subtraction (foreground probability), human detection, 2D/3D point-of-interest detection and feature extraction, and OS-ELM as a machine learning algorithm for action identication; (3) use of an FPGA-based hardware acceleration method to resolve system bottlenecks and improve system performance; and (4) measurement and analysis of system specications, such as the acceleration factor, power consumption, and resource utilization. Experimental results show that the proposed SoC-based hardware acceleration approach provides better performance in terms of the acceleration factor, resource utilization and power consumption among all recent works. In addition, a comparison of the accuracy of the framework that runs on the proposed embedded platform (SoCFPGA) with the accuracy of other PC-based frameworks shows that the proposed approach outperforms most other approaches

    Applications in Electronics Pervading Industry, Environment and Society

    Get PDF
    This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs

    A Closed-Loop Bidirectional Brain-Machine Interface System For Freely Behaving Animals

    Get PDF
    A brain-machine interface (BMI) creates an artificial pathway between the brain and the external world. The research and applications of BMI have received enormous attention among the scientific community as well as the public in the past decade. However, most research of BMI relies on experiments with tethered or sedated animals, using rack-mount equipment, which significantly restricts the experimental methods and paradigms. Moreover, most research to date has focused on neural signal recording or decoding in an open-loop method. Although the use of a closed-loop, wireless BMI is critical to the success of an extensive range of neuroscience research, it is an approach yet to be widely used, with the electronics design being one of the major bottlenecks. The key goal of this research is to address the design challenges of a closed-loop, bidirectional BMI by providing innovative solutions from the neuron-electronics interface up to the system level. Circuit design innovations have been proposed in the neural recording front-end, the neural feature extraction module, and the neural stimulator. Practical design issues of the bidirectional neural interface, the closed-loop controller and the overall system integration have been carefully studied and discussed.To the best of our knowledge, this work presents the first reported portable system to provide all required hardware for a closed-loop sensorimotor neural interface, the first wireless sensory encoding experiment conducted in freely swimming animals, and the first bidirectional study of the hippocampal field potentials in freely behaving animals from sedation to sleep. This thesis gives a comprehensive survey of bidirectional BMI designs, reviews the key design trade-offs in neural recorders and stimulators, and summarizes neural features and mechanisms for a successful closed-loop operation. The circuit and system design details are presented with bench testing and animal experimental results. The methods, circuit techniques, system topology, and experimental paradigms proposed in this work can be used in a wide range of relevant neurophysiology research and neuroprosthetic development, especially in experiments using freely behaving animals
    corecore