1,136 research outputs found
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks
Emotion recognition has become an important field of research in the
human-computer interactions domain. The latest advancements in the field show
that combining visual with audio information lead to better results if compared
to the case of using a single source of information separately. From a visual
point of view, a human emotion can be recognized by analyzing the facial
expression of the person. More precisely, the human emotion can be described
through a combination of several Facial Action Units. In this paper, we propose
a system that is able to recognize emotions with a high accuracy rate and in
real time, based on deep Convolutional Neural Networks. In order to increase
the accuracy of the recognition system, we analyze also the speech data and
fuse the information coming from both sources, i.e., visual and audio.
Experimental results show the effectiveness of the proposed scheme for emotion
recognition and the importance of combining visual with audio data
Comprehensive Study of Automatic Speech Emotion Recognition Systems
Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition
Emotion recognition: recognition of emotions through voice
Dissertação de mestrado integrado em Informatics EngineeringAs the years go by, the interaction between humans and machines seems to gain more and more importance
for many different reasons, whether it's taken into consideration personal or commercial use. On a time
where technology is reaching many parts of our lives, it's important to keep thriving for a healthy progress
and help not only to improve but also to maintain the benefits that everyone gets from it. This relationship
can be tackled through many points, but here the focus will be on the mind.
Emotions are still a mystery. The concept itself brings up serious questions because of its complex nature.
Till the date, scientists still struggle to understand it, so it's crucial to pave the right path for the growth on
technology on the aid of such topic. There is some consensus on a few indicators that provide important
insights on mental state, like words used, facial expressions, voice.
The context of this work is on the use of voice and, based on the field of Automatic Speech Emotion
Recognition, it is proposed a full pipeline of work with a wide scope by resorting to sound capture and
signal processing software, to learning and classifying through algorithms belonging on the Semi Supervised
Learning paradigm and visualization techniques for interpretation of results. For the classification of the
samples,using a semi-supervised approach with Neural Networks represents an important setting to try
alleviating the dependency of human labelling of emotions, a task that has proven to be challenging and,
in many cases, highly subjective, not to mention expensive. It is intended to rely mostly on empiric results
more than theoretical concepts due to the complexity of the human emotions concept and its inherent
uncertainty, but never to disregard prior knowledge on the matter.À medida que os anos passam, a interacção entre indivíduos e máquinas tem vindo a ganhar maior importância por várias razões, quer seja para uso pessoal ou comercial. Numa altura onde a tecnologia está
a chegar a várias partes das nossas vidas, é importante continuar a perseguir um progresso saudável e
ajudar não só a melhorar mas também manter os benefícios que todos recebem. Esta relação pode ser
abordada por vários pontos, neste trabalho o foco está na mente.
Emoções são um mistério. O próprio conceito levanta questões sobre a sua natureza complexa. Até aos
dias de hoje, muitos cientistas debatem-se para a compreender, e é crucial que um caminho apropriado seja
criado para o crescimento de tecnologia na ajuda da compreensão deste assunto. Existe algum consenso
sobre indicadores que demonstram pistas importantes sobre o estado mental de um sujeito, como palavras,
expressões faciais, voz.
O conteúdo deste trabalho foca-se na voz e, com base no campo de Automatic Speech Emotion Recognition, é proposto uma sequência de procedimentos diversificados, ao optar por software de captura de som
e processamento de sinais, aprendizagem e classificação através de algoritmos de Aprendizagem Semi
Supervisionada e técnicas de visualização para interpretar resultados. Para a classificação de amostras, o
uso de uma abordagem Semi Supervisionada com redes neuronais representam um procedimentos importante para tentar combater a alta dependência da anotação de amostras de emoções humanas, uma tarefa
que se demonstra ser árdua e, em muitos casos, altamente subjectiva, para não dizer cara. A intenção é
estabelecer raciocínios baseados em factores experimentais, mais que teóricos, devido à complexidade do
conceito de emoções humanas e à sua incerteza associada, mas tendo sempre em conta conhecimento
já estabelecido no assunto
- …