682 research outputs found

    The Devil is in the Tails: Fine-grained Classification in the Wild

    Get PDF
    The world is long-tailed. What does this mean for computer vision and visual recognition? The main two implications are (1) the number of categories we need to consider in applications can be very large, and (2) the number of training examples for most categories can be very small. Current visual recognition algorithms have achieved excellent classification accuracy. However, they require many training examples to reach peak performance, which suggests that long-tailed distributions will not be dealt with well. We analyze this question in the context of eBird, a large fine-grained classification dataset, and a state-of-the-art deep network classification algorithm. We find that (a) peak classification performance on well-represented categories is excellent, (b) given enough data, classification performance suffers only minimally from an increase in the number of classes, (c) classification performance decays precipitously as the number of training examples decreases, (d) surprisingly, transfer learning is virtually absent in current methods. Our findings suggest that our community should come to grips with the question of long tails

    Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition

    Full text link
    The development of foundation vision models has pushed the general visual recognition to a high level, but cannot well address the fine-grained recognition in specialized domain such as invasive species classification. Identifying and managing invasive species has strong social and ecological value. Currently, most invasive species datasets are limited in scale and cover a narrow range of species, which restricts the development of deep-learning based invasion biometrics systems. To fill the gap of this area, we introduced Species196, a large-scale semi-supervised dataset of 196-category invasive species. It collects over 19K images with expert-level accurate annotations Species196-L, and 1.2M unlabeled images of invasive species Species196-U. The dataset provides four experimental settings for benchmarking the existing models and algorithms, namely, supervised learning, semi-supervised learning, self-supervised pretraining and zero-shot inference ability of large multi-modal models. To facilitate future research on these four learning paradigms, we conduct an empirical study of the representative methods on the introduced dataset. The dataset is publicly available at https://species-dataset.github.io/.Comment: Accepted by NeurIPS 2023 Track Datasets and Benchmark

    Deep filter banks for texture recognition, description, and segmentation

    Get PDF
    Visual textures have played a key role in image understanding because they convey important semantics of images, and because texture representations that pool local image descriptors in an orderless manner have had a tremendous impact in diverse applications. In this paper we make several contributions to texture understanding. First, instead of focusing on texture instance and material category recognition, we propose a human-interpretable vocabulary of texture attributes to describe common texture patterns, complemented by a new describable texture dataset for benchmarking. Second, we look at the problem of recognizing materials and texture attributes in realistic imaging conditions, including when textures appear in clutter, developing corresponding benchmarks on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic texture representations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another.Comment: 29 pages; 13 figures; 8 table

    Classification of bird species from video using appearance and motion features

    Get PDF
    The monitoring of bird populations can provide important information on the state of sensitive ecosystems; however, the manual collection of reliable population data is labour-intensive, time-consuming, and potentially error prone. Automated monitoring using computer vision is therefore an attractive proposition, which could facilitate the collection of detailed data on a much larger scale than is currently possible. A number of existing algorithms are able to classify bird species from individual high quality detailed images often using manual inputs (such as a priori parts labelling). However, deployment in the field necessitates fully automated in-flight classification, which remains an open challenge due to poor image quality, high and rapid variation in pose, and similar appearance of some species. We address this as a fine-grained classification problem, and have collected a video dataset of thirteen bird classes (ten species and another with three colour variants) for training and evaluation. We present our proposed algorithm, which selects effective features from a large pool of appearance and motion features. We compare our method to others which use appearance features only, including image classification using state-of-the-art Deep Convolutional Neural Networks (CNNs). Using our algorithm we achieved a 90% correct classification rate, and we also show that using effectively selected motion and appearance features together can produce results which outperform state-of-the-art single image classifiers. We also show that the most significant motion features improve correct classification rates by 7% compared to using appearance features alone

    Thinking like a naturalist: enhancing computer vision of citizen science images by harnessing contextual data

    Get PDF
    1. The accurate identification of species in images submitted by citizen scientists is currently a bottleneck for many data uses. Machine learning tools offer the potential to provide rapid, objective and scalable species identification for the benefit of many aspects of ecological science. Currently, most approaches only make use of image pixel data for classification. However, an experienced naturalist would also use a wide variety of contextual information such as the location and date of recording. 2. Here, we examine the automated identification of ladybird (Coccinellidae) records from the British Isles submitted to the UK Ladybird Survey, a volunteerā€led mass participation recording scheme. Each image is associated with metadata; a date, location and recorder ID, which can be crossā€referenced with other data sources to determine local weather at the time of recording, habitat types and the experience of the observer. We built multiā€input neural network models that synthesize metadata and images to identify records to species level. 3. We show that machine learning models can effectively harness contextual information to improve the interpretation of images. Against an imageā€only baseline of 48.2%, we observe a 9.1 percentageā€point improvement in topā€1 accuracy with a multiā€input model compared to only a 3.6% increase when using an ensemble of image and metadata models. This suggests that contextual data are being used to interpret an image, beyond just providing a prior expectation. We show that our neural network models appear to be utilizing similar pieces of evidence as human naturalists to make identifications. 4. Metadata is a key tool for human naturalists. We show it can also be harnessed by computer vision systems. Contextualization offers considerable extra information, particularly for challenging species, even within small and relatively homogeneous areas such as the British Isles. Although complex relationships between disparate sources of information can be profitably interpreted by simple neural network architectures, there is likely considerable room for further progress. Contextualizing images has the potential to lead to a step change in the accuracy of automated identification tools, with considerable benefits for largeā€scale verification of submitted records

    Multi-Label Bird Species Classification Using Sequential Aggregation Strategy from Audio Recordings

    Get PDF
    Birds are excellent bioindicators, playing a vital role in maintaining the delicate balance of ecosystems. Identifying species from bird vocalization is arduous but has high research gain. The paper focuses on the detection of multiple bird vocalizations from recordings. The proposed work uses a deep convolutional neural network (DCNN) and a recurrent neural network (RNN) architecture to learn the bird's vocalization from mel-spectrogram and mel-frequency cepstral coefficient (MFCC), respectively. We adopted a sequential aggregation strategy to make a decision on an audio file. We normalized the aggregated sigmoid probabilities and considered the nodes with the highest scores to be the target species. We evaluated the proposed methods on the Xeno-canto bird sound database, which comprises ten species. We compared the performance of our approach to that of transfer learning and Vanilla-DNN methods. Notably, the proposed DCNN and VGG-16 models achieved average F1 metrics of 0.75 and 0.65, respectively, outperforming the acoustic cue-based Vanilla-DNN approach

    Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

    Full text link
    The impressive performances of deep learning architectures is associated to massive increase of models complexity. Millions of parameters need be tuned, with training and inference time scaling accordingly. But is massive fine-tuning necessary? In this paper, focusing on image classification, we consider a simple transfer learning approach exploiting pretrained convolutional features as input for a fast kernel method. We refer to this approach as top-tuning, since only the kernel classifier is trained. By performing more than 2500 training processes we show that this top-tuning approach provides comparable accuracy w.r.t. fine-tuning, with a training time that is between one and two orders of magnitude smaller. These results suggest that top-tuning provides a useful alternative to fine-tuning in small/medium datasets, especially when training efficiency is crucial
    • ā€¦
    corecore