577 research outputs found
A Study on Human Face Expressions using Convolutional Neural Networks and Generative Adversarial Networks
Human beings express themselves via words, signs, gestures, and facial emotions. Previous research using pre-trained convolutional models had been done by freezing the entire network and running the models without the use of any image processing techniques. In this research, we attempt to enhance the accuracy of many deep CNN architectures like ResNet and Senet, using a variety of different image processing techniques like Image Data Generator, Histogram Equalization, and UnSharpMask. We used FER 2013, which is a dataset containing multiple classes of images. While working on these models, we decided to take things to the next level, and we attempted to make changes to the models themselves to improve their accuracy.
While working on this research, we were introduced to another concept in Deep Learning known as Generative Adversarial Networks, which are also known as GANs. They are generative deep learning models which are based on deep CNN models, and they comprise two CNN models - a Generator and a Discriminator. The primary task of the former is to generate random noises in the form of images and passes them to the latter. The Discriminator compares the noise with the input image and accepts/rejects it, based on the similarity. Over the years, there have been various distinguished architectures of GANs namely CycleGAN, StyleGAN, etc. which have allowed us to create sophisticated architectures to not only generate the same image as the original input but also to make changes to them and generate different images. For example, CycleGAN allows us to change the season of scenery from Summer to Winter or change the emotion in the face of a person from happy to sad. Though these sophisticated models are good, we are working with an architecture that has two deep neural networks, which essentially creates problems with hyperparameter tuning and overfitting
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks
auDeep is a Python toolkit for deep unsupervised representation learning from
acoustic data. It is based on a recurrent sequence to sequence autoencoder
approach which can learn representations of time series data by taking into
account their temporal dynamics. We provide an extensive command line interface
in addition to a Python API for users and developers, both of which are
comprehensively documented and publicly available at
https://github.com/auDeep/auDeep. Experimental results indicate that auDeep
features are competitive with state-of-the art audio classification
Emotion recognition: recognition of emotions through voice
Dissertação de mestrado integrado em Informatics EngineeringAs the years go by, the interaction between humans and machines seems to gain more and more importance
for many different reasons, whether it's taken into consideration personal or commercial use. On a time
where technology is reaching many parts of our lives, it's important to keep thriving for a healthy progress
and help not only to improve but also to maintain the benefits that everyone gets from it. This relationship
can be tackled through many points, but here the focus will be on the mind.
Emotions are still a mystery. The concept itself brings up serious questions because of its complex nature.
Till the date, scientists still struggle to understand it, so it's crucial to pave the right path for the growth on
technology on the aid of such topic. There is some consensus on a few indicators that provide important
insights on mental state, like words used, facial expressions, voice.
The context of this work is on the use of voice and, based on the field of Automatic Speech Emotion
Recognition, it is proposed a full pipeline of work with a wide scope by resorting to sound capture and
signal processing software, to learning and classifying through algorithms belonging on the Semi Supervised
Learning paradigm and visualization techniques for interpretation of results. For the classification of the
samples,using a semi-supervised approach with Neural Networks represents an important setting to try
alleviating the dependency of human labelling of emotions, a task that has proven to be challenging and,
in many cases, highly subjective, not to mention expensive. It is intended to rely mostly on empiric results
more than theoretical concepts due to the complexity of the human emotions concept and its inherent
uncertainty, but never to disregard prior knowledge on the matter.À medida que os anos passam, a interacção entre indivíduos e máquinas tem vindo a ganhar maior importância por várias razões, quer seja para uso pessoal ou comercial. Numa altura onde a tecnologia está
a chegar a várias partes das nossas vidas, é importante continuar a perseguir um progresso saudável e
ajudar não só a melhorar mas também manter os benefícios que todos recebem. Esta relação pode ser
abordada por vários pontos, neste trabalho o foco está na mente.
Emoções são um mistério. O próprio conceito levanta questões sobre a sua natureza complexa. Até aos
dias de hoje, muitos cientistas debatem-se para a compreender, e é crucial que um caminho apropriado seja
criado para o crescimento de tecnologia na ajuda da compreensão deste assunto. Existe algum consenso
sobre indicadores que demonstram pistas importantes sobre o estado mental de um sujeito, como palavras,
expressões faciais, voz.
O conteúdo deste trabalho foca-se na voz e, com base no campo de Automatic Speech Emotion Recognition, é proposto uma sequência de procedimentos diversificados, ao optar por software de captura de som
e processamento de sinais, aprendizagem e classificação através de algoritmos de Aprendizagem Semi
Supervisionada e técnicas de visualização para interpretar resultados. Para a classificação de amostras, o
uso de uma abordagem Semi Supervisionada com redes neuronais representam um procedimentos importante para tentar combater a alta dependência da anotação de amostras de emoções humanas, uma tarefa
que se demonstra ser árdua e, em muitos casos, altamente subjectiva, para não dizer cara. A intenção é
estabelecer raciocínios baseados em factores experimentais, mais que teóricos, devido à complexidade do
conceito de emoções humanas e à sua incerteza associada, mas tendo sempre em conta conhecimento
já estabelecido no assunto
- …