44,349 research outputs found
Deep Learning Face Attributes in the Wild
Predicting face attributes in the wild is challenging due to complex face
variations. We propose a novel deep learning framework for attribute prediction
in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly
with attribute tags, but pre-trained differently. LNet is pre-trained by
massive general object categories for face localization, while ANet is
pre-trained by massive face identities for attribute prediction. This framework
not only outperforms the state-of-the-art with a large margin, but also reveals
valuable facts on learning face representation.
(1) It shows how the performances of face localization (LNet) and attribute
prediction (ANet) can be improved by different pre-training strategies.
(2) It reveals that although the filters of LNet are fine-tuned only with
image-level attribute tags, their response maps over entire images have strong
indication of face locations. This fact enables training LNet for face
localization with only image-level annotations, but without face bounding boxes
or landmarks, which are required by all attribute recognition works.
(3) It also demonstrates that the high-level hidden neurons of ANet
automatically discover semantic concepts after pre-training with massive face
identities, and such concepts are significantly enriched after fine-tuning with
attribute tags. Each attribute can be well explained with a sparse linear
combination of these concepts.Comment: To appear in International Conference on Computer Vision (ICCV) 201
Review of Face Detection Systems Based Artificial Neural Networks Algorithms
Face detection is one of the most relevant applications of image processing
and biometric systems. Artificial neural networks (ANN) have been used in the
field of image processing and pattern recognition. There is lack of literature
surveys which give overview about the studies and researches related to the
using of ANN in face detection. Therefore, this research includes a general
review of face detection studies and systems which based on different ANN
approaches and algorithms. The strengths and limitations of these literature
studies and systems were included also.Comment: 16 pages, 12 figures, 1 table, IJMA Journa
Underwater Fish Detection with Weak Multi-Domain Supervision
Given a sufficiently large training dataset, it is relatively easy to train a
modern convolution neural network (CNN) as a required image classifier.
However, for the task of fish classification and/or fish detection, if a CNN
was trained to detect or classify particular fish species in particular
background habitats, the same CNN exhibits much lower accuracy when applied to
new/unseen fish species and/or fish habitats. Therefore, in practice, the CNN
needs to be continuously fine-tuned to improve its classification accuracy to
handle new project-specific fish species or habitats. In this work we present a
labelling-efficient method of training a CNN-based fish-detector (the Xception
CNN was used as the base) on relatively small numbers (4,000) of project-domain
underwater fish/no-fish images from 20 different habitats. Additionally, 17,000
of known negative (that is, missing fish) general-domain (VOC2012) above-water
images were used. Two publicly available fish-domain datasets supplied
additional 27,000 of above-water and underwater positive/fish images. By using
this multi-domain collection of images, the trained Xception-based binary
(fish/not-fish) classifier achieved 0.17% false-positives and 0.61%
false-negatives on the project's 20,000 negative and 16,000 positive holdout
test images, respectively. The area under the ROC curve (AUC) was 99.94%.Comment: Published in the 2019 International Joint Conference on Neural
Networks (IJCNN-2019), Budapest, Hungary, July 14-19, 2019,
https://www.ijcnn.org/ , https://ieeexplore.ieee.org/document/885190
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
- …