108 research outputs found
Facial Expression Recognition from World Wild Web
Recognizing facial expression in a wild setting has remained a challenging
task in computer vision. The World Wide Web is a good source of facial images
which most of them are captured in uncontrolled conditions. In fact, the
Internet is a Word Wild Web of facial images with expressions. This paper
presents the results of a new study on collecting, annotating, and analyzing
wild facial expressions from the web. Three search engines were queried using
1250 emotion related keywords in six different languages and the retrieved
images were mapped by two annotators to six basic expressions and neutral. Deep
neural networks and noise modeling were used in three different training
scenarios to find how accurately facial expressions can be recognized when
trained on noisy images collected from the web using query terms (e.g. happy
face, laughing man, etc)? The results of our experiments show that deep neural
networks can recognize wild facial expressions with an accuracy of 82.12%
Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work
Deep networks thrive when trained on large scale data collections. This has
given ImageNet a central role in the development of deep architectures for
visual object classification. However, ImageNet was created during a specific
period in time, and as such it is prone to aging, as well as dataset bias
issues. Moving beyond fixed training datasets will lead to more robust visual
systems, especially when deployed on robots in new environments which must
train on the objects they encounter there. To make this possible, it is
important to break free from the need for manual annotators. Recent work has
begun to investigate how to use the massive amount of images available on the
Web in place of manual image annotations. We contribute to this research thread
with two findings: (1) a study correlating a given level of noisily labels to
the expected drop in accuracy, for two deep architectures, on two different
types of noise, that clearly identifies GoogLeNet as a suitable architecture
for learning from Web data; (2) a recipe for the creation of Web datasets with
minimal noise and maximum visual variability, based on a visual and natural
language processing concept expansion strategy. By combining these two results,
we obtain a method for learning powerful deep object models automatically from
the Web. We confirm the effectiveness of our approach through object
categorization experiments using our Web-derived version of ImageNet on a
popular robot vision benchmark database, and on a lifelong object discovery
task on a mobile robot.Comment: 8 pages, 7 figures, 3 table
- …