221,799 research outputs found
Feature-driven Emergence of Model Graphs for Object Recognition and Categorization
An important requirement for the expression of cognitive structures
is the ability to form mental objects by rapidly binding together
constituent parts. In this sense, one may conceive the brain\u27s data
structure to have the form of graphs whose nodes are labeled with
elementary features. These provide a versatile data format with the
additional ability to render the structure of any mental object.
Because of the multitude of possible object variations the graphs
are required to be dynamic. Upon presentation of an image a
so-called model graph should rapidly emerge by binding together
memorized subgraphs derived from earlier learning examples driven by the
image features. In this model, the richness and flexibility of the
mind is made possible by a combinatorical game of immense
complexity. Consequently, the emergence of model graphs is a
laborious task which, in computer vision, has most often been
disregarded in favor of employing model graphs tailored to specific
object categories like, for instance, faces in frontal pose.
Recognition or categorization of arbitrary objects, however, demands
dynamic graphs.
In this work we propose a form of graph dynamics, which proceeds in
two steps. In the first step component classifiers, which decide
whether a feature is present in an image, are learned from training
images. For processing arbitrary objects, features are small
localized grid graphs, so-called parquet graphs, whose nodes are
attributed with Gabor amplitudes. Through combination of these
classifiers into a linear discriminant that conforms to Linsker\u27s
infomax principle a weighted majority voting scheme is implemented.
It allows for preselection of salient learning examples, so-called
model candidates, and likewise for preselection of categories the
object in the presented image supposably belongs to. Each model
candidate is verified in a second step using a variant of elastic
graph matching, a standard correspondence-based technique for face
and object recognition. To further differentiate between model
candidates with similar features it is asserted that the features be
in similar spatial arrangement for the model to be selected. Model
graphs are constructed dynamically by assembling model features into
larger graphs according to their spatial arrangement. From the
viewpoint of pattern recognition, the presented technique is a
combination of a discriminative (feature-based) and a generative
(correspondence-based) classifier while the majority voting scheme
implemented in the feature-based part is an extension of existing
multiple feature subset methods.
We report the results of experiments on standard databases for
object recognition and categorization. The method achieved high
recognition rates on identity, object category, pose, and
illumination type. Unlike many other models the presented
technique can also cope with varying background, multiple objects,
and partial occlusion
Automatic emotional state detection using facial expression dynamic in videos
In this paper, an automatic emotion detection system is built for a computer or machine to detect the emotional state from facial expressions in human computer communication. Firstly, dynamic motion features are extracted from facial expression videos and then advanced machine learning methods for classification and regression are used to predict the emotional states.
The system is evaluated on two publicly available datasets, i.e. GEMEP_FERA and AVEC2013, and satisfied performances are achieved in comparison with the baseline results provided. With this emotional state detection capability, a machine can read the facial expression of its user automatically. This technique can be integrated into applications such as smart robots, interactive games and smart surveillance systems
Island Loss for Learning Discriminative Features in Facial Expression Recognition
Over the past few years, Convolutional Neural Networks (CNNs) have shown
promise on facial expression recognition. However, the performance degrades
dramatically under real-world settings due to variations introduced by subtle
facial appearance changes, head pose variations, illumination changes, and
occlusions.
In this paper, a novel island loss is proposed to enhance the discriminative
power of the deeply learned features. Specifically, the IL is designed to
reduce the intra-class variations while enlarging the inter-class differences
simultaneously. Experimental results on four benchmark expression databases
have demonstrated that the CNN with the proposed island loss (IL-CNN)
outperforms the baseline CNN models with either traditional softmax loss or the
center loss and achieves comparable or better performance compared with the
state-of-the-art methods for facial expression recognition.Comment: 8 pages, 3 figure
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
- …