2,634 research outputs found

    The iNaturalist Species Classification and Detection Dataset

    Get PDF
    Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist species classification and detection dataset, consisting of 859,000 images from over 5,000 different species of plants and animals. It features visually similar species, captured in a wide variety of situations, from all over the world. Images were collected with different camera types, have varying image quality, feature a large class imbalance, and have been verified by multiple citizen scientists. We discuss the collection of the dataset and present extensive baseline experiments using state-of-the-art computer vision classification and detection models. Results show that current non-ensemble based methods achieve only 67% top one classification accuracy, illustrating the difficulty of the dataset. Specifically, we observe poor results for classes with small numbers of training examples suggesting more attention is needed in low-shot learning.Comment: CVPR 201

    Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals

    Get PDF
    This paper addresses unsupervised discovery and localization of dominant objects from a noisy image collection with multiple object classes. The setting of this problem is fully unsupervised, without even image-level annotations or any assumption of a single dominant class. This is far more general than typical colocalization, cosegmentation, or weakly-supervised localization tasks. We tackle the discovery and localization problem using a part-based region matching approach: We use off-the-shelf region proposals to form a set of candidate bounding boxes for objects and object parts. These regions are efficiently matched across images using a probabilistic Hough transform that evaluates the confidence for each candidate correspondence considering both appearance and spatial consistency. Dominant objects are discovered and localized by comparing the scores of candidate regions and selecting those that stand out over other regions containing them. Extensive experimental evaluations on standard benchmarks demonstrate that the proposed approach significantly outperforms the current state of the art in colocalization, and achieves robust object discovery in challenging mixed-class datasets.Comment: CVPR 201

    Face analysis and deepfake detection

    Get PDF
    This thesis concerns deep-learning-based face-related research topics. We explore how to improve the performance of several face systems when confronting challenging variations. In Chapter 1, we provide an introduction and background information on the theme, and we list the main research questions of this dissertation. In Chapter 2, we provide a synthetic face data generator with fully controlled variations and proposed a detailed experimental comparison of main characteristics that influence face detection performance. The result shows that our synthetic dataset could complement face detectors to become more robust against specific features in the real world. Our analysis also reveals that a variety of data augmentation is necessary to address differences in performance. In Chapter 3, we propose an age estimation method for handling large pose variations for unconstrained face images. A Wasserstein-based GAN model is used to complete the full uv texture presentation. The proposed AgeGAN method simultaneously learns to capture the facial uv texture map and age characteristics.In Chapter 4, we propose a maximum mean discrepancy (MMD) based cross-domain face forgery detection. The center and triplet losses are also incorporated to ensure that the learned features are shared by multiple domains and provide better generalization abilities to unseen deep fake samples. In Chapter 5, we introduce an end-to-end framework to predict ages from face videos. Clustering based transfer learning is used to provide proper prediction for imbalanced datasets
    corecore