682 research outputs found
The Devil is in the Tails: Fine-grained Classification in the Wild
The world is long-tailed. What does this mean for computer vision and visual
recognition? The main two implications are (1) the number of categories we need
to consider in applications can be very large, and (2) the number of training
examples for most categories can be very small. Current visual recognition
algorithms have achieved excellent classification accuracy. However, they
require many training examples to reach peak performance, which suggests that
long-tailed distributions will not be dealt with well. We analyze this question
in the context of eBird, a large fine-grained classification dataset, and a
state-of-the-art deep network classification algorithm. We find that (a) peak
classification performance on well-represented categories is excellent, (b)
given enough data, classification performance suffers only minimally from an
increase in the number of classes, (c) classification performance decays
precipitously as the number of training examples decreases, (d) surprisingly,
transfer learning is virtually absent in current methods. Our findings suggest
that our community should come to grips with the question of long tails
Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition
The development of foundation vision models has pushed the general visual
recognition to a high level, but cannot well address the fine-grained
recognition in specialized domain such as invasive species classification.
Identifying and managing invasive species has strong social and ecological
value. Currently, most invasive species datasets are limited in scale and cover
a narrow range of species, which restricts the development of deep-learning
based invasion biometrics systems. To fill the gap of this area, we introduced
Species196, a large-scale semi-supervised dataset of 196-category invasive
species. It collects over 19K images with expert-level accurate annotations
Species196-L, and 1.2M unlabeled images of invasive species Species196-U. The
dataset provides four experimental settings for benchmarking the existing
models and algorithms, namely, supervised learning, semi-supervised learning,
self-supervised pretraining and zero-shot inference ability of large
multi-modal models. To facilitate future research on these four learning
paradigms, we conduct an empirical study of the representative methods on the
introduced dataset. The dataset is publicly available at
https://species-dataset.github.io/.Comment: Accepted by NeurIPS 2023 Track Datasets and Benchmark
Deep filter banks for texture recognition, description, and segmentation
Visual textures have played a key role in image understanding because they
convey important semantics of images, and because texture representations that
pool local image descriptors in an orderless manner have had a tremendous
impact in diverse applications. In this paper we make several contributions to
texture understanding. First, instead of focusing on texture instance and
material category recognition, we propose a human-interpretable vocabulary of
texture attributes to describe common texture patterns, complemented by a new
describable texture dataset for benchmarking. Second, we look at the problem of
recognizing materials and texture attributes in realistic imaging conditions,
including when textures appear in clutter, developing corresponding benchmarks
on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic
texture representations, including bag-of-visual-words and the Fisher vectors,
in the context of deep learning and show that these have excellent efficiency
and generalization properties if the convolutional layers of a deep model are
used as filter banks. We obtain in this manner state-of-the-art performance in
numerous datasets well beyond textures, an efficient method to apply deep
features to image regions, as well as benefit in transferring features from one
domain to another.Comment: 29 pages; 13 figures; 8 table
Classification of bird species from video using appearance and motion features
The monitoring of bird populations can provide important information on the state of sensitive ecosystems; however, the manual collection of reliable population data is labour-intensive, time-consuming, and potentially error prone. Automated monitoring using computer vision is therefore an attractive proposition, which could facilitate the collection of detailed data on a much larger scale than is currently possible.
A number of existing algorithms are able to classify bird species from individual high quality detailed images often using manual inputs (such as a priori parts labelling). However, deployment in the field necessitates fully automated in-flight classification, which remains an open challenge due to poor image quality, high and rapid variation in pose, and similar appearance of some species. We address this as a fine-grained classification problem, and have collected a video dataset of thirteen bird classes (ten species and another with three colour variants) for training and evaluation. We present our proposed algorithm, which selects effective features from a large pool of appearance and motion features. We compare our method to others which use appearance features only, including image classification using state-of-the-art Deep Convolutional Neural Networks (CNNs). Using our algorithm we achieved a 90% correct classification rate, and we also show that using effectively selected motion and appearance features together can produce results which outperform state-of-the-art single image classifiers. We also show that the most significant motion features improve correct classification rates by 7% compared to using appearance features alone
Thinking like a naturalist: enhancing computer vision of citizen science images by harnessing contextual data
1. The accurate identification of species in images submitted by citizen scientists is currently a bottleneck for many data uses. Machine learning tools offer the potential to provide rapid, objective and scalable species identification for the benefit of many aspects of ecological science. Currently, most approaches only make use of image pixel data for classification. However, an experienced naturalist would also use a wide variety of contextual information such as the location and date of recording.
2. Here, we examine the automated identification of ladybird (Coccinellidae) records from the British Isles submitted to the UK Ladybird Survey, a volunteerāled mass participation recording scheme. Each image is associated with metadata; a date, location and recorder ID, which can be crossāreferenced with other data sources to determine local weather at the time of recording, habitat types and the experience of the observer. We built multiāinput neural network models that synthesize metadata and images to identify records to species level.
3. We show that machine learning models can effectively harness contextual information to improve the interpretation of images. Against an imageāonly baseline of 48.2%, we observe a 9.1 percentageāpoint improvement in topā1 accuracy with a multiāinput model compared to only a 3.6% increase when using an ensemble of image and metadata models. This suggests that contextual data are being used to interpret an image, beyond just providing a prior expectation. We show that our neural network models appear to be utilizing similar pieces of evidence as human naturalists to make identifications.
4. Metadata is a key tool for human naturalists. We show it can also be harnessed by computer vision systems. Contextualization offers considerable extra information, particularly for challenging species, even within small and relatively homogeneous areas such as the British Isles. Although complex relationships between disparate sources of information can be profitably interpreted by simple neural network architectures, there is likely considerable room for further progress. Contextualizing images has the potential to lead to a step change in the accuracy of automated identification tools, with considerable benefits for largeāscale verification of submitted records
Multi-Label Bird Species Classification Using Sequential Aggregation Strategy from Audio Recordings
Birds are excellent bioindicators, playing a vital role in maintaining the delicate balance of ecosystems. Identifying species from bird vocalization is arduous but has high research gain. The paper focuses on the detection of multiple bird vocalizations from recordings. The proposed work uses a deep convolutional neural network (DCNN) and a recurrent neural network (RNN) architecture to learn the bird's vocalization from mel-spectrogram and mel-frequency cepstral coefficient (MFCC), respectively. We adopted a sequential aggregation strategy to make a decision on an audio file. We normalized the aggregated sigmoid probabilities and considered the nodes with the highest scores to be the target species. We evaluated the proposed methods on the Xeno-canto bird sound database, which comprises ten species. We compared the performance of our approach to that of transfer learning and Vanilla-DNN methods. Notably, the proposed DCNN and VGG-16 models achieved average F1 metrics of 0.75 and 0.65, respectively, outperforming the acoustic cue-based Vanilla-DNN approach
Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods
The impressive performances of deep learning architectures is associated to
massive increase of models complexity. Millions of parameters need be tuned,
with training and inference time scaling accordingly. But is massive
fine-tuning necessary? In this paper, focusing on image classification, we
consider a simple transfer learning approach exploiting pretrained
convolutional features as input for a fast kernel method. We refer to this
approach as top-tuning, since only the kernel classifier is trained. By
performing more than 2500 training processes we show that this top-tuning
approach provides comparable accuracy w.r.t. fine-tuning, with a training time
that is between one and two orders of magnitude smaller. These results suggest
that top-tuning provides a useful alternative to fine-tuning in small/medium
datasets, especially when training efficiency is crucial
- ā¦