577 research outputs found

    A Study on Human Face Expressions using Convolutional Neural Networks and Generative Adversarial Networks

    Get PDF
    Human beings express themselves via words, signs, gestures, and facial emotions. Previous research using pre-trained convolutional models had been done by freezing the entire network and running the models without the use of any image processing techniques. In this research, we attempt to enhance the accuracy of many deep CNN architectures like ResNet and Senet, using a variety of different image processing techniques like Image Data Generator, Histogram Equalization, and UnSharpMask. We used FER 2013, which is a dataset containing multiple classes of images. While working on these models, we decided to take things to the next level, and we attempted to make changes to the models themselves to improve their accuracy. While working on this research, we were introduced to another concept in Deep Learning known as Generative Adversarial Networks, which are also known as GANs. They are generative deep learning models which are based on deep CNN models, and they comprise two CNN models - a Generator and a Discriminator. The primary task of the former is to generate random noises in the form of images and passes them to the latter. The Discriminator compares the noise with the input image and accepts/rejects it, based on the similarity. Over the years, there have been various distinguished architectures of GANs namely CycleGAN, StyleGAN, etc. which have allowed us to create sophisticated architectures to not only generate the same image as the original input but also to make changes to them and generate different images. For example, CycleGAN allows us to change the season of scenery from Summer to Winter or change the emotion in the face of a person from happy to sad. Though these sophisticated models are good, we are working with an architecture that has two deep neural networks, which essentially creates problems with hyperparameter tuning and overfitting

    Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets

    Full text link
    In this work, we propose a novel approach for generating videos of the six basic facial expressions given a neutral face image. We propose to exploit the face geometry by modeling the facial landmarks motion as curves encoded as points on a hypersphere. By proposing a conditional version of manifold-valued Wasserstein generative adversarial network (GAN) for motion generation on the hypersphere, we learn the distribution of facial expression dynamics of different classes, from which we synthesize new facial expression motions. The resulting motions can be transformed to sequences of landmarks and then to images sequences by editing the texture information using another conditional Generative Adversarial Network. To the best of our knowledge, this is the first work that explores manifold-valued representations with GAN to address the problem of dynamic facial expression generation. We evaluate our proposed approach both quantitatively and qualitatively on two public datasets; Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the effectiveness of our approach in generating realistic videos with continuous motion, realistic appearance and identity preservation. We also show the efficiency of our framework for dynamic facial expressions generation, dynamic facial expression transfer and data augmentation for training improved emotion recognition models

    auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks

    Get PDF
    auDeep is a Python toolkit for deep unsupervised representation learning from acoustic data. It is based on a recurrent sequence to sequence autoencoder approach which can learn representations of time series data by taking into account their temporal dynamics. We provide an extensive command line interface in addition to a Python API for users and developers, both of which are comprehensively documented and publicly available at https://github.com/auDeep/auDeep. Experimental results indicate that auDeep features are competitive with state-of-the art audio classification

    Emotion recognition: recognition of emotions through voice

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringAs the years go by, the interaction between humans and machines seems to gain more and more importance for many different reasons, whether it's taken into consideration personal or commercial use. On a time where technology is reaching many parts of our lives, it's important to keep thriving for a healthy progress and help not only to improve but also to maintain the benefits that everyone gets from it. This relationship can be tackled through many points, but here the focus will be on the mind. Emotions are still a mystery. The concept itself brings up serious questions because of its complex nature. Till the date, scientists still struggle to understand it, so it's crucial to pave the right path for the growth on technology on the aid of such topic. There is some consensus on a few indicators that provide important insights on mental state, like words used, facial expressions, voice. The context of this work is on the use of voice and, based on the field of Automatic Speech Emotion Recognition, it is proposed a full pipeline of work with a wide scope by resorting to sound capture and signal processing software, to learning and classifying through algorithms belonging on the Semi Supervised Learning paradigm and visualization techniques for interpretation of results. For the classification of the samples,using a semi-supervised approach with Neural Networks represents an important setting to try alleviating the dependency of human labelling of emotions, a task that has proven to be challenging and, in many cases, highly subjective, not to mention expensive. It is intended to rely mostly on empiric results more than theoretical concepts due to the complexity of the human emotions concept and its inherent uncertainty, but never to disregard prior knowledge on the matter.À medida que os anos passam, a interacção entre indivíduos e máquinas tem vindo a ganhar maior importância por várias razões, quer seja para uso pessoal ou comercial. Numa altura onde a tecnologia está a chegar a várias partes das nossas vidas, é importante continuar a perseguir um progresso saudável e ajudar não só a melhorar mas também manter os benefícios que todos recebem. Esta relação pode ser abordada por vários pontos, neste trabalho o foco está na mente. Emoções são um mistério. O próprio conceito levanta questões sobre a sua natureza complexa. Até aos dias de hoje, muitos cientistas debatem-se para a compreender, e é crucial que um caminho apropriado seja criado para o crescimento de tecnologia na ajuda da compreensão deste assunto. Existe algum consenso sobre indicadores que demonstram pistas importantes sobre o estado mental de um sujeito, como palavras, expressões faciais, voz. O conteúdo deste trabalho foca-se na voz e, com base no campo de Automatic Speech Emotion Recognition, é proposto uma sequência de procedimentos diversificados, ao optar por software de captura de som e processamento de sinais, aprendizagem e classificação através de algoritmos de Aprendizagem Semi Supervisionada e técnicas de visualização para interpretar resultados. Para a classificação de amostras, o uso de uma abordagem Semi Supervisionada com redes neuronais representam um procedimentos importante para tentar combater a alta dependência da anotação de amostras de emoções humanas, uma tarefa que se demonstra ser árdua e, em muitos casos, altamente subjectiva, para não dizer cara. A intenção é estabelecer raciocínios baseados em factores experimentais, mais que teóricos, devido à complexidade do conceito de emoções humanas e à sua incerteza associada, mas tendo sempre em conta conhecimento já estabelecido no assunto
    corecore