64 research outputs found

    PERFORMANCE COMPARISON OF GRADIENT-BASED CONVOLUTIONAL NEURAL NETWORK OPTIMIZERS FOR FACIAL EXPRESSION RECOGNITION

    Get PDF
    A convolutional neural network (CNN) is one of the machine learning models that achieve excellent success in recognizing human facial expressions. Technological developments have given birth to many optimizers that can be used to train the CNN model. Therefore, this study focuses on implementing and comparing 14 gradient-based CNN optimizers to classify facial expressions in two datasets, namely the Advanced Computing Class 2022 (ACC22) and Extended Cohn-Kanade (CK+) datasets. The 14 optimizers are classical gradient descent, traditional momentum, Nesterov momentum, AdaGrad, AdaDelta, RMSProp, Adam, Radam, AdaMax, AMSGrad, Nadam, AdamW, OAdam, and AdaBelief. This study also provides a review of the mathematical formulas of each optimizer. Using the best default parameters of each optimizer, the CNN model is trained using the training data to minimize the cross-entropy value up to 100 epochs. The trained CNN model is measured for its accuracy performance using training and testing data. The results show that the Adam, Nadam, and AdamW optimizers provide the best performance in model training and testing in terms of minimizing cross-entropy and accuracy of the trained model. The three models produce a cross-entropy of around 0.1 at the 100th epoch with an accuracy of more than 90% on both training and testing data. Furthermore, the Adam optimizer provides the best accuracy on the testing data for the ACC22 and CK+ datasets, which are 100% and 98.64%, respectively. Therefore, the Adam optimizer is the most appropriate optimizer to be used to train the CNN model in the case of facial expression recognition

    A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

    Get PDF
    Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e. deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision - convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm

    Deep Learning Optimizers Comparison in Facial Expression Recognition

    Get PDF
    Artificial Intelligence is everywhere we go, whether it is programming an interactive cleaning robot or detecting a bank fraud. Its rise is inevitable. In the last few decades, many new architectures and approaches were brought up, so it becomes hard to know what is the best approach or architecture for a certain area. One of such areas is the detection of emotion in the human face, most commonly known by Facial Expression Recognition (or FER). In this work we started by doing an intensive collection of data concerning the theories that explain the existence of emotions, how they are distinguished from one another, and how they are recognized in a human face. After this, we started to develop deep learning models with different architectures as to compare their performances when used for Facial Expression Recognition. After developing the models, we took one of them and tested it with different deep learning optimizer algorithms, as to verify the difference among them, thus figuring out the best optimizing algorithm for this particular case.A Inteligência Artifical encontra-se presente em todo o lado, quer seja a programar um robô de limpeza interativo ou a detetar uma fraude bancária. A sua ascensão é inevitável. Nas últimas décadas, foram criadas inúmeras novas arquiteturas e abordagens e, por isso, torna-se difícil saber qual a melhor abordagem ou arquitetura para uma certa área. Uma dessas áreas é a deteção de emoção na cara humana, também conhecida como Reconhecimento de Expressão Facial. Neste trabalho começámos por realizar uma coleta intensiva de dados acerca das teorias que explicam a existência de emoções, como as mesmas são distinguidas umas das outras e como podem ser identificadas numa cara humana. Posteriormente, começámos a desenvolver modelos de deep learning com diferentes arquiteturas para comparar os respetivos desempenhos quando usadas em Reconhecimento de Expressão Facial. Após desenvolver os modelos, pegámos num dos mesmos e testámo-lo com diferentes algoritmos de otimização deep learning de forma a verificar quais as diferenças entre os mesmos, percebendo assim qual o mais indicado para uso neste caso em particular

    Sistema de reconhecimento de expressões faciais para deteção de stress

    Get PDF
    Stress is the body's natural reaction to external and internal stimuli. Despite being something natural, prolonged exposure to stressors can contribute to serious health problems. These reactions are reflected not only physiologically, but also psychologically, translating into emotions and facial expressions. Once this relationship between the experience of stressful situations and the demonstration of certain emotions in response was understood, it was decided to develop a system capable of classifying facial expressions and thereby creating a stress detector. The proposed solution consists of two main blocks. A convolutional neural network capable of classifying facial expressions, and an application that uses this model to classify real-time images of the user's face and thereby verify whether or not it shows signs of stress. The application consists in capturing real-time images from the webcam, extract the user's face, classify which facial expression he expresses, and with these classifications assess whether or not he shows signs of stress in a given time interval. As soon as the application determines the presence of signs of stress, it notifies the user. For the creation of the classification model, was used transfer learning, together with finetuning. In this way, we took advantage of the pre-trained networks VGG16, VGG19, and Inception-ResNet V2 to solve the problem at hand. For the transfer learning process, were also tried two classifier architectures. After several experiments, it was determined that VGG16, together with a classifier made up of a convolutional layer, was the candidate with the best performance at classifying stressful emotions. Having presented an MCC of 0.8969 in the test images of the KDEF dataset, 0.5551 in the Net Images dataset, and 0.4250 in the CK +.O stress é uma reação natural do corpo a estímulos externos e internos. Apesar de ser algo natural, a exposição prolongada a stressors pode contribuir para sérios problemas de saúde. Essas reações refletem-se não só fisiologicamente, mas também psicologicamente. Traduzindose em emoções e expressões faciais. Uma vez compreendida esta relação entre a experiência de situações stressantes e a demonstração de determinadas emoções como resposta, decidiu-se desenvolver um sistema capaz de classificar expressões faciais e com isso criar um detetor de stress. A solução proposta é constituida por dois blocos fundamentais. Uma rede neuronal convolucional capaz de classificar expressões faciais e uma aplicação que utiliza esse modelo para classificar imagens em tempo real do rosto do utilizador e assim averiguar se este apresenta ou não sinais de stress. A aplicação consiste em captar imagens em tempo real a partir da webcam, extrair o rosto do utilizador, classificar qual a expressão facial que este manifesta, e com essas classificações avaliar se num determinado intervalo temporal este apresenta ou não sinais de stress. Assim que a aplicação determine a presença de sinais de stress, esta irá notificar o utilizador. Para a criação do modelo de classificação, foi utilizado transfer learning, juntamente com finetuning. Desta forma tirou-se partido das redes pre-treinadas VGG16, VGG19, e InceptionResNet V2 para a resolução do problema em mãos. Para o processo de transfer learning foram também experimentadas duas arquiteturas de classificadores. Após várias experiências, determinou-se que a VGG16, juntamente com um classificador constituido por uma camada convolucional era a candidata com melhor desempenho a classificar emoções stressantes. Tendo apresentado um MCC de 0,8969 nas imagens de teste do conjunto de dados KDEF, 0,5551 no conjunto de dados Net Images, e 0,4250 no CK+

    Review : Deep learning in electron microscopy

    Get PDF
    Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy

    A study of deep learning and its applications to face recognition techniques

    Get PDF
    El siguiente trabajo es el resultado de la tesis de maestría de Fernando Suzacq. La tesis se centró alrededor de la investigación sobre el reconocimiento facial en 3D, sin la reconstrucción de la profundidad ni la utilización de modelos 3D genéricos. Esta investigación resultó en la escritura de un paper y su posterior publicación en IEEE Transactions on Pattern Analysis and Machine Intelligence. Mediante el uso de iluminación activa, se mejora el reconocimiento facial en 2D y se lo hace más robusto a condiciones de baja iluminación o ataques de falsificación de identidad. La idea central del trabajo es la proyección de un patrón de luz de alta frecuencia sobre la cara de prueba. De la captura de esta imagen, nos es posible recuperar información real 3D, que se desprende de las deformaciones de este patrón, junto con una imagen 2D de la cara de prueba. Este proceso evita tener que lidiar con la difícil tarea de reconstrucción 3D. En el trabajo se presenta la teoría que fundamenta este proceso, se explica su construcción y se proveen los resultados de distintos experimentos realizados que sostienen su validez y utilidad. Para el desarrollo de esta investigación, fue necesario el estudio de la teoría existente y una revisión del estado del arte en este problema particular. Parte del resultado de este trabajo se presenta también en este documento, como marco teórico sobre la publicación

    Optimization of convolutional neural networks for image classification using genetic algorithms and bayesian optimization

    Get PDF
    Notwithstanding the recent successes of deep convolutional neural networks for classification tasks, they are sensitive to the selection of their hyperparameters, which impose an exponentially large search space on modern convolutional models. Traditional hyperparameter selection methods include manual, grid, or random search, but these require expert knowledge or are computationally burdensome. Divergently, Bayesian optimization and evolutionary inspired techniques have surfaced as viable alternatives to the hyperparameter problem. Thus, an alternative hybrid approach that combines the advantages of these techniques is proposed. Specifically, the search space is partitioned into discrete-architectural, and continuous and categorical hyperparameter subspaces, which are respectively traversed by a stochastic genetic search, followed by a genetic-Bayesian search. Simulations on a prominent image classification task reveal that the proposed method results in an overall classification accuracy improvement of 0.87% over unoptimized baselines, and a greater than 97% reduction in computational costs compared to a commonly employed brute force approach.Electrical and Mining EngineeringM. Tech. (Electrical Engineering

    Design and Implementation of a Domain Specific Language for Deep Learning

    Get PDF
    \textit {Deep Learning} (DL) has found great success in well-diversified areas such as machine vision, speech recognition, big data analysis, and multimedia understanding recently. However, the existing state-of-the-art DL frameworks, e.g. Caffe2, Theano, TensorFlow, MxNet, Torch7, and CNTK, are programming libraries with fixed user interfaces, internal representations, and execution environments. Modifying the code of DL layers or data structure is very challenging without in-depth understanding of the underlying implementation. The optimization of the code and execution in these tools is often limited and relies on the specific DL computation graph manipulation and scheduling that lack systematic and universal strategies. Furthermore, most of these tools demand many dependencies beside the tool itself and require to be built to some specific platforms for DL training or inference. \\\\ \noindent This dissertation presents {\it DeepDSL}, a \textit {domain specific language} (DSL) embedded in Scala, that compiles DL networks encoded with DeepDSL to efficient, compact, and portable Java source programs for DL training and inference. DeepDSL represents DL networks as abstract tensor functions, performs symbolic gradient derivations to generate the Intermediate Representation (IR), optimizes the IR expressions, and compiles the optimized IR expressions to cross-platform Java code that is easily modifiable and debuggable. Also, the code directly runs on GPU without additional dependencies except a small set of \textit{JNI} (Java Native Interface) wrappers for invoking the underneath GPU libraries. Moreover, DeepDSL provides static analysis for memory consumption and error detection. \\\\ \noindent DeepDSL\footnote{Our previous results are reported in~\cite{zhao2017}; design and implementation details are summarized in~\cite{Zhao2018}.} has been evaluated with many current state-of-the-art DL networks (e.g. Alexnet, GoogleNet, VGG, Overfeat, and Deep Residual Network). While the DSL code is highly compact with less than 100 lines for each of the network, the Java source code generated by the DeepDSL compiler is highly efficient. Our experiments show that the output java source has very competitive runtime performance and memory efficiency compared to the existing DL frameworks
    corecore