25 research outputs found

    Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases

    Full text link
    The impact of the maximally possible batch size (for the better runtime) on performance of graphic processing units (GPU) and tensor processing units (TPU) during training and inference phases is investigated. The numerous runs of the selected deep neural network (DNN) were performed on the standard MNIST and Fashion-MNIST datasets. The significant speedup was obtained even for extremely low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite powerful GPU NVIDIA Tesla K80 card with the speedup up to 10x for training stage (without taking into account the overheads) and speedup up to 2x for prediction stage (with and without taking into account overheads). The precise speedup values depend on the utilization level of TPUv2 units and increase with the increase of the data volume under processing, but for the datasets used in this work (MNIST and Fashion-MNIST with images of sizes 28x28) the speedup was observed for batch sizes >512 images for training phase and >40 000 images for prediction phase. It should be noted that these results were obtained without detriment to the prediction accuracy and loss that were equal for both GPU and TPU runs up to the 3rd significant digit for MNIST dataset, and up to the 2nd significant digit for Fashion-MNIST dataset.Comment: 10 pages, 7 figures, 2 table

    Fixed Point Analysis Workflow for efficient Design of Convolutional Neural Networks in Hearing Aids

    Get PDF
    Neural networks (NN) are a powerful tool to tackle complex problems in hearing aid research, but their use on hearing aid hardware is currently limited by memory and processing power. To enable the training with these constrains, a fixed point analysis and a memory friendly power of two quantization (replacing multiplications with shift operations) scheme has been implemented extending TensorFlow, a standard framework for training neural networks, and the Qkeras package [1, 2]. The implemented fixed point analysis detects quantization issues like overflows, underflows, precision problems and zero gradients. The analysis is done for each layer in every epoch for weights, biases and activations respectively. With this information the quantization can be optimized, e.g. by modifying the bit width, number of integer bits or the quantization scheme to a power of two quantization. To demonstrate the applicability of this method a case study has been conducted. Therefore a CNN has been trained to predict the Ideal Ratio Mask (IRM) for noise reduction in audio signals. The dataset consists of speech samples from the TIMIT dataset mixed with noise from the Urban Sound 8kand VAD-dataset at 0 dB SNR. The CNN was trained in floating point, fixed point and a power of two quantization. The CNN architecture consists of six convolutional layers followed by three dense layers. From initially 1.9 MB memory footprint for 468k float32 weights, the power of two quantized network is reduced to 236 kB, while the Short Term Objective Intelligibility (STOI) Improvement drops only from 0.074 to 0.067. Despite the quantization only a minimal drop in performance was observed, while saving up to 87.5 % of memory, thus being suited for employment in a hearing ai

    Convolutional Neural Networks for Speech Controlled Prosthetic Hands

    Full text link
    Speech recognition is one of the key topics in artificial intelligence, as it is one of the most common forms of communication in humans. Researchers have developed many speech-controlled prosthetic hands in the past decades, utilizing conventional speech recognition systems that use a combination of neural network and hidden Markov model. Recent advancements in general-purpose graphics processing units (GPGPUs) enable intelligent devices to run deep neural networks in real-time. Thus, state-of-the-art speech recognition systems have rapidly shifted from the paradigm of composite subsystems optimization to the paradigm of end-to-end optimization. However, a low-power embedded GPGPU cannot run these speech recognition systems in real-time. In this paper, we show the development of deep convolutional neural networks (CNN) for speech control of prosthetic hands that run in real-time on a NVIDIA Jetson TX2 developer kit. First, the device captures and converts speech into 2D features (like spectrogram). The CNN receives the 2D features and classifies the hand gestures. Finally, the hand gesture classes are sent to the prosthetic hand motion control system. The whole system is written in Python with Keras, a deep learning library that has a TensorFlow backend. Our experiments on the CNN demonstrate the 91% accuracy and 2ms running time of hand gestures (text output) from speech commands, which can be used to control the prosthetic hands in real-time.Comment: 2019 First International Conference on Transdisciplinary AI (TransAI), Laguna Hills, California, USA, 2019, pp. 35-4

    Periodic orbits in chaotic systems simulated at low precision

    Get PDF
    Non-periodic solutions are an essential property of chaotic dynamical systems. Simulations with deterministic finite-precision numbers, however, always yield orbits that are eventually periodic. With 64-bit double-precision floating-point numbers such periodic orbits are typically negligible due to very long periods. The emerging trend to accelerate simulations with low-precision numbers, such as 16-bit half-precision floats, raises questions on the fidelity of such simulations of chaotic systems. Here, we revisit the 1-variable logistic map and the generalised Bernoulli map with various number formats and precisions: floats, posits and logarithmic fixed-point. Simulations are improved with higher precision but stochastic rounding prevents periodic orbits even at low precision. For larger systems the performance gain from low-precision simulations is often reinvested in higher resolution or complexity, increasing the number of variables. In the Lorenz 1996 system, the period lengths of orbits increase exponentially with the number of variables. Moreover, invariant measures are better approximated with an increased number of variables than with increased precision. Extrapolating to large simulations of natural systems, such as million-variable climate models, periodic orbit lengths are far beyond reach of present-day computers. Such orbits are therefore not expected to be problematic compared to high-precision simulations but the deviation of both from the continuum solution remains unclear

    Area-Efficient FPGA Implementation of Minimalistic Convolutional Neural Network Using Residue Number System

    Get PDF
    Convolutional Neural Networks (CNN) is the promising tool for solving task of image recognition in computer vision systems. However, the most known implementation of CNNs require a significant amount of memory for storing weights in training and work. To reduce the resource costs of CNN implementation we propose the architecture that separated on hardware and software parts for performance optimization. Also we propose to use Residue Number System (RNS) arithmetic in the hardware part which implements the convolutional layer of CNN. Software simulation using Matlab 2017b shows that CNN with a minimum number of layers can be quickly and successfully trained. Hardware simulation using FPGA Kintex7 xc7k70tfbg484-2 demonstrates that using RNS in convolutional layer of CNN allows to reduce hardware costs by 32% compared with the traditional approach based on the binary number system