Search CORE

682 research outputs found

The Devil is in the Tails: Fine-grained Classification in the Wild

Author: Perona Pietro
Van Horn Grant
Publication venue
Publication date: 05/09/2017
Field of study

The world is long-tailed. What does this mean for computer vision and visual recognition? The main two implications are (1) the number of categories we need to consider in applications can be very large, and (2) the number of training examples for most categories can be very small. Current visual recognition algorithms have achieved excellent classification accuracy. However, they require many training examples to reach peak performance, which suggests that long-tailed distributions will not be dealt with well. We analyze this question in the context of eBird, a large fine-grained classification dataset, and a state-of-the-art deep network classification algorithm. We find that (a) peak classification performance on well-represented categories is excellent, (b) given enough data, classification performance suffers only minimally from an increase in the number of classes, (c) classification performance decays precipitously as the number of training examples decreases, (d) surprisingly, transfer learning is virtually absent in current methods. Our findings suggest that our community should come to grips with the question of long tails

arXiv.org e-Print Archive

Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition

Author: Han Kai
He Wei
Nie Ying
Wang Chengcheng
Wang Yunhe
Publication venue
Publication date: 26/09/2023
Field of study

The development of foundation vision models has pushed the general visual recognition to a high level, but cannot well address the fine-grained recognition in specialized domain such as invasive species classification. Identifying and managing invasive species has strong social and ecological value. Currently, most invasive species datasets are limited in scale and cover a narrow range of species, which restricts the development of deep-learning based invasion biometrics systems. To fill the gap of this area, we introduced Species196, a large-scale semi-supervised dataset of 196-category invasive species. It collects over 19K images with expert-level accurate annotations Species196-L, and 1.2M unlabeled images of invasive species Species196-U. The dataset provides four experimental settings for benchmarking the existing models and algorithms, namely, supervised learning, semi-supervised learning, self-supervised pretraining and zero-shot inference ability of large multi-modal models. To facilitate future research on these four learning paradigms, we conduct an empirical study of the representative methods on the introduced dataset. The dataset is publicly available at https://species-dataset.github.io/.Comment: Accepted by NeurIPS 2023 Track Datasets and Benchmark

arXiv.org e-Print Archive

Deep filter banks for texture recognition, description, and segmentation

Author: Cimpoi Mircea
Kokkinos Iasonas
Maji Subhransu
Vedaldi Andrea
Publication venue
Publication date: 18/11/2015
Field of study

Visual textures have played a key role in image understanding because they convey important semantics of images, and because texture representations that pool local image descriptors in an orderless manner have had a tremendous impact in diverse applications. In this paper we make several contributions to texture understanding. First, instead of focusing on texture instance and material category recognition, we propose a human-interpretable vocabulary of texture attributes to describe common texture patterns, complemented by a new describable texture dataset for benchmarking. Second, we look at the problem of recognizing materials and texture attributes in realistic imaging conditions, including when textures appear in clutter, developing corresponding benchmarks on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic texture representations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another.Comment: 29 pages; 13 figures; 8 table

arXiv.org e-Print Archive

HAL-CentraleSupelec

Springer - Publisher Connector

INRIA a CCSD electronic archive server

UCL Discovery

PubMed Central

Oxford University Research Archive

HAL-Rennes 1

Classification of bird species from video using appearance and motion features

Author: Atanbori
Atanbori
Atanbori
Berg
Berg
Betke
Beyan
Beyan
Branson
Breiman
Breiman
Bruderer
Chang
Cullinan
Deng
Domingos
Du
Duan
Duberstein
Edward Shaw
Everingham
Gavves
Gavves
Gonzalez
Gu
Guyon
Hall
Hall
Howard
Hristov
Huang
Huang
Jacob
John Atanbori
Kofi Appiah
Krause
Krizhevsky
Lazarevic
Lee
Lee
Lee
Li
Liwicki
Mai
Marini
Matzner
Moore
Patrick Dickinson
Peng
Pun
Robnik-Šikonja
Rodrigues
Rother
Sergyan
Simonyan
Spampinato
Suzuki
Suzuki
Tang
Toloşi
Wah
Wah
Wenting Duan
Yu
Zhang
Zivkovic
Publication venue: 'Elsevier BV'
Publication date: 18/07/2018
Field of study

The monitoring of bird populations can provide important information on the state of sensitive ecosystems; however, the manual collection of reliable population data is labour-intensive, time-consuming, and potentially error prone. Automated monitoring using computer vision is therefore an attractive proposition, which could facilitate the collection of detailed data on a much larger scale than is currently possible. A number of existing algorithms are able to classify bird species from individual high quality detailed images often using manual inputs (such as a priori parts labelling). However, deployment in the field necessitates fully automated in-flight classification, which remains an open challenge due to poor image quality, high and rapid variation in pose, and similar appearance of some species. We address this as a fine-grained classification problem, and have collected a video dataset of thirteen bird classes (ten species and another with three colour variants) for training and evaluation. We present our proposed algorithm, which selects effective features from a large pool of appearance and motion features. We compare our method to others which use appearance features only, including image classification using state-of-the-art Deep Convolutional Neural Networks (CNNs). Using our algorithm we achieved a 90% correct classification rate, and we also show that using effectively selected motion and appearance features together can produce results which outperform state-of-the-art single image classifiers. We also show that the most significant motion features improve correct classification rates by 7% compared to using appearance features alone

University of Lincoln Institutional Repository

Repository@Hull - Worktribe

Crossref

Sheffield Hallam University Research Archive

Thinking like a naturalist: enhancing computer vision of citizen science images by harnessing contextual data

Author: August Tom A.
Roy Helen E.
Terry J. Christopher D.
Publication venue: 'Wiley'
Publication date: 01/02/2020
Field of study

1. The accurate identification of species in images submitted by citizen scientists is currently a bottleneck for many data uses. Machine learning tools offer the potential to provide rapid, objective and scalable species identification for the benefit of many aspects of ecological science. Currently, most approaches only make use of image pixel data for classification. However, an experienced naturalist would also use a wide variety of contextual information such as the location and date of recording. 2. Here, we examine the automated identification of ladybird (Coccinellidae) records from the British Isles submitted to the UK Ladybird Survey, a volunteer‐led mass participation recording scheme. Each image is associated with metadata; a date, location and recorder ID, which can be cross‐referenced with other data sources to determine local weather at the time of recording, habitat types and the experience of the observer. We built multi‐input neural network models that synthesize metadata and images to identify records to species level. 3. We show that machine learning models can effectively harness contextual information to improve the interpretation of images. Against an image‐only baseline of 48.2%, we observe a 9.1 percentage‐point improvement in top‐1 accuracy with a multi‐input model compared to only a 3.6% increase when using an ensemble of image and metadata models. This suggests that contextual data are being used to interpret an image, beyond just providing a prior expectation. We show that our neural network models appear to be utilizing similar pieces of evidence as human naturalists to make identifications. 4. Metadata is a key tool for human naturalists. We show it can also be harnessed by computer vision systems. Contextualization offers considerable extra information, particularly for challenging species, even within small and relatively homogeneous areas such as the British Isles. Although complex relationships between disparate sources of information can be profitably interpreted by simple neural network architectures, there is likely considerable room for further progress. Contextualizing images has the potential to lead to a step change in the accuracy of automated identification tools, with considerable benefits for large‐scale verification of submitted records

Crossref

NERC Open Research Archive

Multi-Label Bird Species Classification Using Sequential Aggregation Strategy from Audio Recordings

Author: Abdul Kareem Noumida
Rajan Rajeev
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 31/01/2024
Field of study

Birds are excellent bioindicators, playing a vital role in maintaining the delicate balance of ecosystems. Identifying species from bird vocalization is arduous but has high research gain. The paper focuses on the detection of multiple bird vocalizations from recordings. The proposed work uses a deep convolutional neural network (DCNN) and a recurrent neural network (RNN) architecture to learn the bird's vocalization from mel-spectrogram and mel-frequency cepstral coefficient (MFCC), respectively. We adopted a sequential aggregation strategy to make a decision on an audio file. We normalized the aggregated sigmoid probabilities and considered the nodes with the highest scores to be the target species. We evaluated the proposed methods on the Xeno-canto bird sound database, which comprises ten species. We compared the performance of our approach to that of transfer learning and Vanilla-DNN methods. Notably, the proposed DCNN and VGG-16 models achieved average F1 metrics of 0.75 and 0.65, respectively, outperforming the acoustic cue-based Vanilla-DNN approach

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods

Author: Alfano Paolo Didier
Odone Francesca
Pastore Vito Paolo
Rosasco Lorenzo
Publication venue
Publication date: 16/09/2022
Field of study

The impressive performances of deep learning architectures is associated to massive increase of models complexity. Millions of parameters need be tuned, with training and inference time scaling accordingly. But is massive fine-tuning necessary? In this paper, focusing on image classification, we consider a simple transfer learning approach exploiting pretrained convolutional features as input for a fast kernel method. We refer to this approach as top-tuning, since only the kernel classifier is trained. By performing more than 2500 training processes we show that this top-tuning approach provides comparable accuracy w.r.t. fine-tuning, with a training time that is between one and two orders of magnitude smaller. These results suggest that top-tuning provides a useful alternative to fine-tuning in small/medium datasets, especially when training efficiency is crucial

arXiv.org e-Print Archive