598 research outputs found

    Video Based Automatic Speech Recognition Using Neural Networks

    Get PDF
    Neural network approaches have become popular in the field of automatic speech recognition (ASR). Most ASR methods use audio data to classify words. Lip reading ASR techniques utilize only video data, which compensates for noisy environments where audio may be compromised. A comprehensive approach, including the vetting of datasets and development of a preprocessing chain, to video-based ASR is developed. This approach will be based on neural networks, namely 3D convolutional neural networks (3D-CNN) and Long short-term memory (LSTM). These types of neural networks are designed to take in temporal data such as videos. Various combinations of different neural network architecture and preprocessing techniques are explored. The best performing neural network architecture, a CNN with bidirectional LSTM, compares favorably against recent works on video-based ASR

    Efficient FPGA-Based Inference Architectures for Deep Learning Networks

    Get PDF
    L’apprentissage profond est devenu la technique de pointe pour de nombreuses applications de classification et de régression. Les modèles d’apprentissage profond, tels que les réseaux de neurones profonds (Deep Neural Network - DNN) et les réseaux de neurones convolutionnels (Convolutional Neural Network - CNN), déploient des dizaines de couches cachées avec des centaines de neurones pour obtenir une représentation significative des données d’entrée. La puissance des DNN et des CNN provient du fait qu’ils sont formés par apprentissage de caractéristiques extraites plutôt que par des algorithmes spécifiques à une tâche. Cependant, cela se fait aux dépens d’un coût de calcul élevé pour les processus d’apprentissage et d’inférence. Cela nécessite des accélérateurs avec de hautes performances et économes en énergie, en particulier pour les inférences lorsque le traitement en temps réel est important. Les FPGA offrent une plateforme attrayante pour accélérer l’inférence des DNN et des CNN en raison de leurs performances, dû à leur configurabilité et de leur efficacité énergétique. Dans cette thèse, nous abordons trois problèmes principaux. Premièrement, nous examinons le problème de la mise en oeuvre précise et efficace des DNN traditionnels entièrement connectés sur les FPGA. Bien que les réseaux de neurones binaires (Binary Neural Network - BNN) utilisent une représentation de données compacte sur un bit par rapport aux données à virgule fixe et à virgule flottante pour les DNN et les CNN traditionnels, ils peuvent encore nécessiter trop de ressources de calcul et de mémoire. Par conséquent, nous étudions le problème de l’implémentation des BNN sur FPGA en tant que deuxième problème. Enfin, nous nous concentrons sur l’introduction des FPGA en tant qu’accélérateurs matériels pour un plus grand nombre de développeurs de logiciels, en particulier ceux qui ne maîtrisent pas les connaissances en programmation sur FPGA. Pour résoudre le premier problème, et dans la mesure où l’implémentation efficace de fonctions d’activation non linéaires est essentielle à la mise en oeuvre de modèles d’apprentissage profond sur les FPGA, nous introduisons une implémentation de fonction d’activation non linéaire basée sur le filtre à interpolation de la transformée cosinus discrète (Discrete Cosine Transform Interpolation Filter - DCTIF). L’architecture d’interpolation proposée combine des opérations arithmétiques sur des échantillons stockés de la fonction de tangente hyperbolique et sur les données d’entrée. Cette solution offre une précision 3× supérieure à celle des travaux précédents, tout en utilisant une quantité similaire des ressources de calculs et une petite quantité de mémoire. Différentes combinaisons de paramètres du filtre DCTIF peuvent être choisies pour compenser la précision et la complexité globale du circuit de la fonction tangente hyperbolique.----------ABSTRACT: Deep learning has evolved to become the state-of-the-art technique for numerous classification and regression applications. Deep learning models, such as Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), deploy dozens of hidden layers with hundreds of neurons to learn a meaningful representation of the input data. The power of DNNs and CNNs comes from the fact that they are trained through feature learning rather than task-specific algorithms. However, this comes at the expense of high computational cost for both training and inference processes. This necessitates high-performance and energyefficient accelerators, especially for inference where real-time processing matters. FPGAs offer an appealing platform for accelerating the inference of DNNs and CNNs due to their performance, configurability and energy-efficiency. In this thesis, we address three main problems. Firstly, we consider the problem of realizing a precise but efficient implementation of traditional fully connected DNNs in FPGAs. Although Binary Neural Networks (BNNs) use compact data representation (1-bit) compared to fixedpoint data and floating-point representation in traditional DNNs and CNNs, they may still need too many computational and memory resources. Therefore, we study the problem of implementing BNNs in FPGAs as the second problem. Finally, we focus on introducing FPGAs as accelerators to a wider range of software developers, especially those who do not posses FPGA programming knowledge. To address the first problem, and since efficient implementation of non-linear activation functions is essential to the implementation of deep learning models on FPGAs, we introduce a non-linear activation function implementation based on the Discrete Cosine Transform Interpolation Filter (DCTIF). The proposed interpolation architecture combines arithmetic operations on the stored samples of the hyperbolic tangent function and on input data. It achieves almost 3× better precision than previous works while using a similar amount of computational resources and a small amount of memory. Various combinations of DCTIF parameters can be chosen to trade off the accuracy and the overall circuit complexity of the tanh function. In an attempt to address the first and third problems, we introduce a Single hidden layer Neural Network (SNN) multiplication-free overlay architecture with fully connected DNN-level performance. This FPGA inference overlay can be used for applications that are normally solved with fully connected DNNs. The overlay avoids the time needed to synthesize, place, route and regenerate a new bitstream when the application changes. The SNN overlay in puts and activations are quantized to power-of-two values, which allows utilizing shift units instead of multipliers. Since the overlay is a SNN, we fill the FPGA chip with the maximum possible number of neurons that can work in parallel in the hidden layer. We evaluate the proposed architecture on typical benchmark datasets and demonstrate higher throughput with respect to the state-of-the-art while achieving the same accuracy. In addition, the SNN overlay makes the power and versatility of FPGAs available to a wider DNN user community and to improve DNN design efficiency

    Malware Classification Using LSTMs

    Get PDF
    Signature and anomaly based detection have long been quintessential techniques used in malware detection. However, these techniques have become increasingly ineffective as malware becomes more complex. Researchers have therefore turned to deep learning to construct better performing models. In this project, we create four different long-short term memory (LSTM) models and train each model to classify malware by family type. Our data consists of opcodes extracted from malware executables. We employ techniques used in natural language processing (NLP) such as word embedding and bidirection LSTMs (biLSTM). We also use convolutional neural networks (CNN). We found that our model consisting of word embedding, biLSTMs and CNN layers performed the best in classifying malware

    Digital image forensics via meta-learning and few-shot learning

    Get PDF
    Digital images are a substantial portion of the information conveyed by social media, the Internet, and television in our daily life. In recent years, digital images have become not only one of the public information carriers, but also a crucial piece of evidence. The widespread availability of low-cost, user-friendly, and potent image editing software and mobile phone applications facilitates altering images without professional expertise. Consequently, safeguarding the originality and integrity of digital images has become a difficulty. Forgers commonly use digital image manipulation to transmit misleading information. Digital image forensics investigates the irregular patterns that might result from image alteration. It is crucial to information security. Over the past several years, machine learning techniques have been effectively used to identify image forgeries. Convolutional Neural Networks(CNN) are a frequent machine learning approach. A standard CNN model could distinguish between original and manipulated images. In this dissertation, two CNN models are introduced to recognize seam carving and Gaussian filtering. Training a conventional CNN model for a new similar image forgery detection task, one must start from scratch. Additionally, many types of tampered image data are challenging to acquire or simulate. Meta-learning is an alternative learning paradigm in which a machine learning model gets experience across numerous related tasks and uses this expertise to improve its future learning performance. Few-shot learning is a method for acquiring knowledge from few data. It can classify images with as few as one or two examples per class. Inspired by meta-learning and few-shot learning, this dissertation proposed a prototypical networks model capable of resolving a collection of related image forgery detection problems. Unlike traditional CNN models, the proposed prototypical networks model does not need to be trained from scratch for a new task. Additionally, it drastically decreases the quantity of training images

    RTN: Reparameterized Ternary Network

    Full text link
    To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from the direction to extenuate the three issues. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pat-tern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy, and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46x and 89.17x savings on power and area respectively compared with the full precision convolution.Comment: To appear at AAAI-2
    • …
    corecore