87 research outputs found

    A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

    Get PDF
    Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators

    Advanced sensors technology survey

    Get PDF
    This project assesses the state-of-the-art in advanced or 'smart' sensors technology for NASA Life Sciences research applications with an emphasis on those sensors with potential applications on the space station freedom (SSF). The objectives are: (1) to conduct literature reviews on relevant advanced sensor technology; (2) to interview various scientists and engineers in industry, academia, and government who are knowledgeable on this topic; (3) to provide viewpoints and opinions regarding the potential applications of this technology on the SSF; and (4) to provide summary charts of relevant technologies and centers where these technologies are being developed

    Enhancing Real-time Embedded Image Processing Robustness on Reconfigurable Devices for Critical Applications

    Get PDF
    Nowadays, image processing is increasingly used in several application fields, such as biomedical, aerospace, or automotive. Within these fields, image processing is used to serve both non-critical and critical tasks. As example, in automotive, cameras are becoming key sensors in increasing car safety, driving assistance and driving comfort. They have been employed for infotainment (non-critical), as well as for some driver assistance tasks (critical), such as Forward Collision Avoidance, Intelligent Speed Control, or Pedestrian Detection. The complexity of these algorithms brings a challenge in real-time image processing systems, requiring high computing capacity, usually not available in processors for embedded systems. Hardware acceleration is therefore crucial, and devices such as Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. These devices can assist embedded processors by significantly speeding-up computationally intensive software algorithms. Moreover, critical applications introduce strict requirements not only from the real-time constraints, but also from the device reliability and algorithm robustness points of view. Technology scaling is highlighting reliability problems related to aging phenomena, and to the increasing sensitivity of digital devices to external radiation events that can cause transient or even permanent faults. These faults can lead to wrong information processed or, in the worst case, to a dangerous system failure. In this context, the reconfigurable nature of FPGA devices can be exploited to increase the system reliability and robustness by leveraging Dynamic Partial Reconfiguration features. The research work presented in this thesis focuses on the development of techniques for implementing efficient and robust real-time embedded image processing hardware accelerators and systems for mission-critical applications. Three main challenges have been faced and will be discussed, along with proposed solutions, throughout the thesis: (i) achieving real-time performances, (ii) enhancing algorithm robustness, and (iii) increasing overall system's dependability. In order to ensure real-time performances, efficient FPGA-based hardware accelerators implementing selected image processing algorithms have been developed. Functionalities offered by the target technology, and algorithm's characteristics have been constantly taken into account while designing such accelerators, in order to efficiently tailor algorithm's operations to available hardware resources. On the other hand, the key idea for increasing image processing algorithms' robustness is to introduce self-adaptivity features at algorithm level, in order to maintain constant, or improve, the quality of results for a wide range of input conditions, that are not always fully predictable at design-time (e.g., noise level variations). This has been accomplished by measuring at run-time some characteristics of the input images, and then tuning the algorithm parameters based on such estimations. Dynamic reconfiguration features of modern reconfigurable FPGA have been extensively exploited in order to integrate run-time adaptivity into the designed hardware accelerators. Tools and methodologies have been also developed in order to increase the overall system dependability during reconfiguration processes, thus providing safe run-time adaptation mechanisms. In addition, taking into account the target technology and the environments in which the developed hardware accelerators and systems may be employed, dependability issues have been analyzed, leading to the development of a platform for quickly assessing the reliability and characterizing the behavior of hardware accelerators implemented on reconfigurable FPGAs when they are affected by such faults

    Low-Power and Programmable Analog Circuitry for Wireless Sensors

    Get PDF
    Embedding networks of secure, wirelessly-connected sensors and actuators will help us to conscientiously manage our local and extended environments. One major challenge for this vision is to create networks of wireless sensor devices that provide maximal knowledge of their environment while using only the energy that is available within that environment. In this work, it is argued that the energy constraints in wireless sensor design are best addressed by incorporating analog signal processors. The low power-consumption of an analog signal processor allows persistent monitoring of multiple sensors while the device\u27s analog-to-digital converter, microcontroller, and transceiver are all in sleep mode. This dissertation describes the development of analog signal processing integrated circuits for wireless sensor networks. Specific technology problems that are addressed include reconfigurable processing architectures for low-power sensing applications, as well as the development of reprogrammable biasing for analog circuits

    Low-Power and Programmable Analog Circuitry for Wireless Sensors

    Get PDF
    Embedding networks of secure, wirelessly-connected sensors and actuators will help us to conscientiously manage our local and extended environments. One major challenge for this vision is to create networks of wireless sensor devices that provide maximal knowledge of their environment while using only the energy that is available within that environment. In this work, it is argued that the energy constraints in wireless sensor design are best addressed by incorporating analog signal processors. The low power-consumption of an analog signal processor allows persistent monitoring of multiple sensors while the device\u27s analog-to-digital converter, microcontroller, and transceiver are all in sleep mode. This dissertation describes the development of analog signal processing integrated circuits for wireless sensor networks. Specific technology problems that are addressed include reconfigurable processing architectures for low-power sensing applications, as well as the development of reprogrammable biasing for analog circuits

    Signal Processing for an Autonomous Underwater Vehicle: an FPGA approach

    Get PDF
    The idea of this thesis comes out from the participation of the University of Central Florida to the Annual International Autonomous Underwater Vehicle Competition of 2007. The objective of this competition is to make the AUV to accomplish to a specific route. A part of this route expects the AUV to detect a ping and following it as a source. The objective of this thesis is to improve the performance of this trajectory tracking. A Field Programmable Logic Array will be used to perform an effective Digital Signal Processing

    Dynamic Nanophotonic Structures Leveraging Chalcogenide Phase-Change Materials

    Get PDF
    Chip-scale nanophotonic devices have the potential to enable next-generation imaging, computing, communication, and engineered quantum systems with very stringent performance requirements on size, power, integrability, stability, and bandwidth. The emergence of meta-optic devices with deep subwavelength features has enabled the formation of ultra-thin flat optical structures to replace bulky conventional counterparts in free-space applications. Nevertheless, progress in meta-optics has been slowed due to the passive nature of existing devices and the urgent need for a reliable, fast, low-power, and robust reconfiguration mechanism. In this research, I devised a new material and device platform to resolve this challenge. Through detailed theoretical design, nanofabrication, and experimental demonstration, I demonstrated the unique features of my proposed platform as an essential building block of truly scalable adaptive flat optics for the active manipulation of optical wavefronts. One of the key attributes of this research is the integration of CMOS-compatible materials for the fabrication of passive devices with phase-change materials that provide the largest known modulation of the index of refraction upon stimulation with an optical or electrical signal. A unique selection of phase-change materials for operation in the near-infrared and visible wavelengths has been made, followed by developing the optimum deposition and fabrication processes for the realization of nanophotonics devices that integrate these functional materials with semiconductor and plasmonic materials. A major breakthrough in this process was the design and realization of integrated electrical stimulation circuitry with far better performance compared to existing solutions. Using this platform, I experimentally demonstrated the first electrically tunable meta-optic structure for fast optical switching with a high contrast ratio and dynamic wavefront scanning with a large steering angle. This is a major achievement as it essentially allows the engineering of a desired optical wavefront with fast reconfigurability at low power consumption. In an independent work, I demonstrated, for the first time, a nonvolatile meta-optic structure for high-resolution, wide-gamut, and high-contrast microdisplays with added polarization controllability and the possibility of implementation on a flexible substrate. Further features of this metaphotonic display include: 1) full addressability at the microscale pixel via fast electrical pulses; 2) super-resolution pixels with controllable brightness and contrast; and 3) a wide range of colors with high saturation and purity. Lastly, for the first time, I realized a hybrid photonic-plasmonic meta-optic platform with active control over the spatial, spectral, and temporal properties of an optical wavefront. This is a major achievement as it essentially allows the engineering of a desired optical wavefront with fast reconfigurability at low power consumption. These demonstrations are now being pursued in different directions for novel systems for imaging, sensing, computing, and quantum applications, just to name a few.Ph.D

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

    Speech recognition on DSP: algorithm optimization and performance analysis.

    Get PDF
    Yuan Meng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 85-91).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- History of ASR development --- p.2Chapter 1.2 --- Fundamentals of automatic speech recognition --- p.3Chapter 1.2.1 --- Classification of ASR systems --- p.3Chapter 1.2.2 --- Automatic speech recognition process --- p.4Chapter 1.3 --- Performance measurements of ASR --- p.7Chapter 1.3.1 --- Recognition accuracy --- p.7Chapter 1.3.2 --- Complexity --- p.7Chapter 1.3.3 --- Robustness --- p.8Chapter 1.4 --- Motivation and goal of this work --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Signal processing techniques for front-end --- p.12Chapter 2.1 --- Basic feature extraction principles --- p.13Chapter 2.1.1 --- Pre-emphasis --- p.13Chapter 2.1.2 --- Frame blocking and windowing --- p.13Chapter 2.1.3 --- Discrete Fourier Transform (DFT) computation --- p.15Chapter 2.1.4 --- Spectral magnitudes --- p.15Chapter 2.1.5 --- Mel-frequency filterbank --- p.16Chapter 2.1.6 --- Logarithm of filter energies --- p.18Chapter 2.1.7 --- Discrete Cosine Transformation (DCT) --- p.18Chapter 2.1.8 --- Cepstral Weighting --- p.19Chapter 2.1.9 --- Dynamic featuring --- p.19Chapter 2.2 --- Practical issues --- p.20Chapter 2.2.1 --- Review of practical problems and solutions in ASR appli- cations --- p.20Chapter 2.2.2 --- Model of environment --- p.23Chapter 2.2.3 --- End-point detection (EPD) --- p.23Chapter 2.2.4 --- Spectral subtraction (SS) --- p.25Chapter 3 --- HMM-based Acoustic Modeling --- p.26Chapter 3.1 --- HMMs for ASR --- p.26Chapter 3.2 --- Output probabilities --- p.27Chapter 3.3 --- Viterbi search engine --- p.29Chapter 3.4 --- Isolated word recognition (IWR) & Connected word recognition (CWR) --- p.30Chapter 3.4.1 --- Isolated word recognition --- p.30Chapter 3.4.2 --- Connected word recognition (CWR) --- p.31Chapter 4 --- DSP for embedded applications --- p.32Chapter 4.1 --- "Classification of embedded systems (DSP, ASIC, FPGA, etc.)" --- p.32Chapter 4.2 --- Description of hardware platform --- p.34Chapter 4.3 --- I/O operation for real-time processing --- p.36Chapter 4.4 --- Fixed point algorithm on DSP --- p.40Chapter 5 --- ASR algorithm optimization --- p.42Chapter 5.1 --- Methodology --- p.42Chapter 5.2 --- Floating-point to fixed-point conversion --- p.43Chapter 5.3 --- Computational complexity consideration --- p.45Chapter 5.3.1 --- Feature extraction techniques --- p.45Chapter 5.3.2 --- Viterbi search module --- p.50Chapter 5.4 --- Memory requirements consideration --- p.51Chapter 6 --- Experimental results and performance analysis --- p.53Chapter 6.1 --- Cantonese isolated word recognition (IWR) --- p.54Chapter 6.1.1 --- Execution time --- p.54Chapter 6.1.2 --- Memory requirements --- p.57Chapter 6.1.3 --- Recognition performance --- p.57Chapter 6.2 --- Connected word recognition (CWR) --- p.61Chapter 6.2.1 --- Execution time consideration --- p.62Chapter 6.2.2 --- Recognition performance --- p.62Chapter 6.3 --- Summary & discussion --- p.66Chapter 7 --- Implementation of practical techniques --- p.67Chapter 7.1 --- End-point detection (EPD) --- p.67Chapter 7.2 --- Spectral subtraction (SS) --- p.71Chapter 7.3 --- Experimental results --- p.72Chapter 7.3.1 --- Isolated word recognition (IWR) --- p.72Chapter 7.3.2 --- Connected word recognition (CWR) --- p.75Chapter 7.4 --- Results --- p.77Chapter 8 --- Conclusions and future work --- p.78Chapter 8.1 --- Summary and Conclusions --- p.78Chapter 8.2 --- Suggestions for future research --- p.80Appendices --- p.82Chapter A --- "Interpolation of data entries without floating point, divides or conditional branches" --- p.82Chapter B --- Vocabulary for Cantonese isolated word recognition task --- p.84Bibliography --- p.8
    • …
    corecore