58 research outputs found
LifeCLEF Bird Identification Task 2017
International audienceThe LifeCLEF challenge BirdCLEF offers a large-scale proving ground for system-oriented evaluation of bird species identification based on audio recordings of their sounds. One of its strengths is that it uses data collected through Xeno-canto, the worldwide community of bird sound recordists. This ensures that BirdCLEF is close to the conditions of real-world application, in particular with regard to the number of species in the training set (1500). The main novelty of the 2017 edition of BirdCLEF was the inclusion of soundscape recordings containing time-coded bird species annotations in addition to the usual Xeno-canto recordings that focus on a single foreground species. This paper reports an overview of the systems developed by the five participating research groups, the methodology of the evaluation of their performance, and an analysis and discussion of the results obtained
Domain-specific neural networks improve automated bird sound recognition already with small amount of local data
1. An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such a system provides a possibility to collect bird observations on a scale that no human observer could ever match. During the last decades, progress has been made in the field of automatic bird sound recognition, but recognizing bird species from untargeted soundscape recordings remains a challenge. 2. In this article, we demonstrate the workflow for building a global identification model and adjusting it to perform well on the data of autonomous recorders from a specific region. We show how data augmentation and a combination of global and local data can be used to train a convolutional neural network to classify vocalizations of 101 bird species. We construct a model and train it with a global data set to obtain a base model. The base model is then fine-tuned with local data from Southern Finland in order to adapt it to the sound environment of a specific location and tested with two data sets: one originating from the same Southern Finnish region and another originating from a different region in German Alps. 3. Our results suggest that fine-tuning with local data significantly improves the network performance. Classification accuracy was improved for test recordings from the same area as the local training data (Southern Finland) but not for recordings from a different region (German Alps). Data augmentation enables training with a limited number of training data and even with few local data samples significant improvement over the base model can be achieved. Our model outperforms the current state-of-the-art tool for automatic bird sound classification.An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such a system provides a possibility to collect bird observations on a scale that no human observer could ever match. During the last decades, progress has been made in the field of automatic bird sound recognition, but recognizing bird species from untargeted soundscape recordings remains a challenge. In this article, we demonstrate the workflow for building a global identification model and adjusting it to perform well on the data of autonomous recorders from a specific region. We show how data augmentation and a combination of global and local data can be used to train a convolutional neural network to classify vocalizations of 101 bird species. We construct a model and train it with a global data set to obtain a base model. The base model is then fine-tuned with local data from Southern Finland in order to adapt it to the sound environment of a specific location and tested with two data sets: one originating from the same Southern Finnish region and another originating from a different region in German Alps. Our results suggest that fine-tuning with local data significantly improves the network performance. Classification accuracy was improved for test recordings from the same area as the local training data (Southern Finland) but not for recordings from a different region (German Alps). Data augmentation enables training with a limited number of training data and even with few local data samples significant improvement over the base model can be achieved. Our model outperforms the current state-of-the-art tool for automatic bird sound classification. Using local data to adjust the recognition model for the target domain leads to improvement over general non-tailored solutions. The process introduced in this article can be applied to build a fine-tuned bird sound classification model for a specific environment.Peer reviewe
Mosquito Detection with Neural Networks: The Buzz of Deep Learning
Many real-world time-series analysis problems are characterised by scarce
data. Solutions typically rely on hand-crafted features extracted from the time
or frequency domain allied with classification or regression engines which
condition on this (often low-dimensional) feature vector. The huge advances
enjoyed by many application domains in recent years have been fuelled by the
use of deep learning architectures trained on large data sets. This paper
presents an application of deep learning for acoustic event detection in a
challenging, data-scarce, real-world problem. Our candidate challenge is to
accurately detect the presence of a mosquito from its acoustic signature. We
develop convolutional neural networks (CNNs) operating on wavelet
transformations of audio recordings. Furthermore, we interrogate the network's
predictive power by visualising statistics of network-excitatory samples. These
visualisations offer a deep insight into the relative informativeness of
components in the detection problem. We include comparisons with conventional
classifiers, conditioned on both hand-tuned and generic features, to stress the
strength of automatic deep feature learning. Detection is achieved with
performance metrics significantly surpassing those of existing algorithmic
methods, as well as marginally exceeding those attained by individual human
experts.Comment: For data and software related to this paper, see
http://humbug.ac.uk/kiskin2017/. Submitted as a conference paper to ECML 201
Data-Efficient Classification of Birdcall Through Convolutional Neural Networks Transfer Learning
Deep learning Convolutional Neural Network (CNN) models are powerful
classification models but require a large amount of training data. In niche
domains such as bird acoustics, it is expensive and difficult to obtain a large
number of training samples. One method of classifying data with a limited
number of training samples is to employ transfer learning. In this research, we
evaluated the effectiveness of birdcall classification using transfer learning
from a larger base dataset (2814 samples in 46 classes) to a smaller target
dataset (351 samples in 10 classes) using the ResNet-50 CNN. We obtained 79%
average validation accuracy on the target dataset in 5-fold cross-validation.
The methodology of transfer learning from an ImageNet-trained CNN to a
project-specific and a much smaller set of classes and images was extended to
the domain of spectrogram images, where the base dataset effectively played the
role of the ImageNet.Comment: Accepted for IEEE Digital Image Computing: Techniques and
Applications, 2019 (DICTA 2019), 2-4 December 2019 in Perth, Australia,
http://dicta2019.dictaconference.org/index.htm
Underwater Fish Detection using Deep Learning for Water Power Applications
Clean energy from oceans and rivers is becoming a reality with the
development of new technologies like tidal and instream turbines that generate
electricity from naturally flowing water. These new technologies are being
monitored for effects on fish and other wildlife using underwater video.
Methods for automated analysis of underwater video are needed to lower the
costs of analysis and improve accuracy. A deep learning model, YOLO, was
trained to recognize fish in underwater video using three very different
datasets recorded at real-world water power sites. Training and testing with
examples from all three datasets resulted in a mean average precision (mAP)
score of 0.5392. To test how well a model could generalize to new datasets, the
model was trained using examples from only two of the datasets and then tested
on examples from all three datasets. The resulting model could not recognize
fish in the dataset that was not part of the training set. The mAP scores on
the other two datasets that were included in the training set were higher than
the scores achieved by the model trained on all three datasets. These results
indicate that different methods are needed in order to produce a trained model
that can generalize to new data sets such as those encountered in real world
applications.Comment: Accepted at CSCI 201
- …