Search CORE

9,779 research outputs found

An original framework for understanding human actions and body language by using deep neural networks

Author: MASSARONI CRISTIANO
Publication venue
Publication date: 28/02/2020
Field of study

The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

Archivio della ricerca- Università di Roma La Sapienza

Musemo: Express Musical Emotion Based on Neural Network

Author: Kim Wooyeon
Publication venue: Graduate School of UNIST
Publication date: 01/02/2020
Field of study

Department of Urban and Environmental Engineering (Convergence of Science and Arts)Music elicits emotional responses, which enable people to empathize with the emotional states induced by music, experience changes in their current feelings, receive comfort, and relieve stress (Juslin & Laukka, 2004). Music emotion recognition (MER) is a field of research that extracts emotions from music through various systems and methods. Interest in this field is increasing as researchers try to use it for psychiatric purposes. In order to extract emotions from music, MER requires music and emotion labels for each music. Many MER studies use emotion labels created by non-music-specific psychologists such as Russell???s circumplex model of affects (Russell, 1980) and Ekman???s six basic emotions (Ekman, 1999). However, Zentner, Grandjean, and Scherer suggest that emotions commonly used in music are subdivided into specific areas, rather than spread across the entire spectrum of emotions (Zentner, Grandjean, & Scherer, 2008). Thus, existing MER studies have difficulties with the emotion labels that are not widely agreed through musicians and listeners. This study proposes a musical emotion recognition model ???Musemo??? that follows the Geneva emotion music scale proposed by music psychologists based on a convolution neural network. We evaluate the accuracy of the model by varying the length of music samples used as input of Musemo and achieved RMSE (root mean squared error) performance of up to 14.91%. Also, we examine the correlation among emotion labels by reducing the Musemo???s emotion output vector to two dimensions through principal component analysis. Consequently, we can get results that are similar to the study that Vuoskoski and Eerola analyzed for the Geneva emotion music scale (Vuoskoski & Eerola, 2011). We hope that this study could be expanded to inform treatments to comfort those in need of psychological empathy in modern society.clos

ScholarWorks@UNIST

Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

Author: Abadi Mart'in
Borji Ali
Chen Kan
Chen X
Devlin Jacob
Ioffe Sergey
Kingma Diederik P
Koch Christof
LeCun Yann
Neisser Ubric
Tehranchi Farnaz
Treue Stefan
Wu Xiaoli
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2020
Field of study

Modeling visual search not only offers an opportunity to predict the usability of an interface before actually testing it on real users, but also advances scientific understanding about human behavior. In this work, we first conduct a set of analyses on a large-scale dataset of visual search tasks on realistic webpages. We then present a deep neural network that learns to predict the scannability of webpage content, i.e., how easy it is for a user to find a specific target. Our model leverages both heuristic-based features such as target size and unstructured features such as raw image pixels. This approach allows us to model complex interactions that might be involved in a realistic visual search task, which can not be easily achieved by traditional analytical models. We analyze the model behavior to offer our insights into how the salience map learned by the model aligns with human intuition and how the learned semantic representation of each target type relates to its visual search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System

arXiv.org e-Print Archive

Crossref

Gender-From-Iris or Gender-From-Mascara?

Author: Becker Benedict
Bowyer Kevin
Kuehlkamp Andrey
Publication venue
Publication date: 04/02/2017
Field of study

Predicting a person's gender based on the iris texture has been explored by several researchers. This paper considers several dimensions of experimental work on this problem, including person-disjoint train and test, and the effect of cosmetics on eyelash occlusion and imperfect segmentation. We also consider the use of multi-layer perceptron and convolutional neural networks as classifiers, comparing the use of data-driven and hand-crafted features. Our results suggest that the gender-from-iris problem is more difficult than has so far been appreciated. Estimating accuracy using a mean of N person-disjoint train and test partitions, and considering the effect of makeup - a combination of experimental conditions not present in any previous work - we find a much weaker ability to predict gender-from-iris texture than has been suggested in previous work

arXiv.org e-Print Archive

Crossref