2,634 research outputs found
The iNaturalist Species Classification and Detection Dataset
Existing image classification datasets used in computer vision tend to have a
uniform distribution of images across object categories. In contrast, the
natural world is heavily imbalanced, as some species are more abundant and
easier to photograph than others. To encourage further progress in challenging
real world conditions we present the iNaturalist species classification and
detection dataset, consisting of 859,000 images from over 5,000 different
species of plants and animals. It features visually similar species, captured
in a wide variety of situations, from all over the world. Images were collected
with different camera types, have varying image quality, feature a large class
imbalance, and have been verified by multiple citizen scientists. We discuss
the collection of the dataset and present extensive baseline experiments using
state-of-the-art computer vision classification and detection models. Results
show that current non-ensemble based methods achieve only 67% top one
classification accuracy, illustrating the difficulty of the dataset.
Specifically, we observe poor results for classes with small numbers of
training examples suggesting more attention is needed in low-shot learning.Comment: CVPR 201
Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals
This paper addresses unsupervised discovery and localization of dominant
objects from a noisy image collection with multiple object classes. The setting
of this problem is fully unsupervised, without even image-level annotations or
any assumption of a single dominant class. This is far more general than
typical colocalization, cosegmentation, or weakly-supervised localization
tasks. We tackle the discovery and localization problem using a part-based
region matching approach: We use off-the-shelf region proposals to form a set
of candidate bounding boxes for objects and object parts. These regions are
efficiently matched across images using a probabilistic Hough transform that
evaluates the confidence for each candidate correspondence considering both
appearance and spatial consistency. Dominant objects are discovered and
localized by comparing the scores of candidate regions and selecting those that
stand out over other regions containing them. Extensive experimental
evaluations on standard benchmarks demonstrate that the proposed approach
significantly outperforms the current state of the art in colocalization, and
achieves robust object discovery in challenging mixed-class datasets.Comment: CVPR 201
Face analysis and deepfake detection
This thesis concerns deep-learning-based face-related research topics. We explore how to improve the performance of several face systems when confronting challenging variations. In Chapter 1, we provide an introduction and background information on the theme, and we list the main research questions of this dissertation. In Chapter 2, we provide a synthetic face data generator with fully controlled variations and proposed a detailed experimental comparison of main characteristics that influence face detection performance. The result shows that our synthetic dataset could complement face detectors to become more robust against specific features in the real world. Our analysis also reveals that a variety of data augmentation is necessary to address differences in performance. In Chapter 3, we propose an age estimation method for handling large pose variations for unconstrained face images. A Wasserstein-based GAN model is used to complete the full uv texture presentation. The proposed AgeGAN method simultaneously learns to capture the facial uv texture map and age characteristics.In Chapter 4, we propose a maximum mean discrepancy (MMD) based cross-domain face forgery detection. The center and triplet losses are also incorporated to ensure that the learned features are shared by multiple domains and provide better generalization abilities to unseen deep fake samples. In Chapter 5, we introduce an end-to-end framework to predict ages from face videos. Clustering based transfer learning is used to provide proper prediction for imbalanced datasets
- …