118,260 research outputs found

    Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes

    Full text link
    In this work we propose approaches to effectively transfer knowledge from weakly labeled web audio data. We first describe a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data. Our model trains efficiently from audios of variable lengths; hence, it is well suited for transfer learning. We then propose methods to learn representations using this model which can be effectively used for solving the target task. We study both transductive and inductive transfer learning tasks, showing the effectiveness of our methods for both domain and task adaptation. We show that the learned representations using the proposed CNN model generalizes well enough to reach human level accuracy on ESC-50 sound events dataset and set state of art results on this dataset. We further use them for acoustic scene classification task and once again show that our proposed approaches suit well for this task as well. We also show that our methods are helpful in capturing semantic meanings and relations as well. Moreover, in this process we also set state-of-art results on Audioset dataset, relying on balanced training set.Comment: ICASSP 201

    Do we still need ImageNet pre-training in remote sensing scene classification?

    Full text link
    Due to the scarcity of labeled data, using supervised models pre-trained on ImageNet is a de facto standard in remote sensing scene classification. Recently, the availability of larger high resolution remote sensing (HRRS) image datasets and progress in self-supervised learning have brought up the questions of whether supervised ImageNet pre-training is still necessary for remote sensing scene classification and would supervised pre-training on HRRS image datasets or self-supervised pre-training on ImageNet achieve better results on target remote sensing scene classification tasks. To answer these questions, in this paper we both train models from scratch and fine-tune supervised and self-supervised ImageNet models on several HRRS image datasets. We also evaluate the transferability of learned representations to HRRS scene classification tasks and show that self-supervised pre-training outperforms the supervised one, while the performance of HRRS pre-training is similar to self-supervised pre-training or slightly lower. Finally, we propose using an ImageNet pre-trained model combined with a second round of pre-training using in-domain HRRS images, i.e. domain-adaptive pre-training. The experimental results show that domain-adaptive pre-training results in models that achieve state-of-the-art results on HRRS scene classification benchmarks. The source code and pre-trained models are available at \url{https://github.com/risojevicv/RSSC-transfer}

    Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

    Full text link
    This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs). In object and scene analysis, deep neural nets are capable of learning a hierarchical chain of abstraction from pixel inputs to concise and descriptive representations. The current work explores this capacity in the realm of document analysis, and confirms that this representation strategy is superior to a variety of popular hand-crafted alternatives. Experiments also show that (i) features extracted from CNNs are robust to compression, (ii) CNNs trained on non-document images transfer well to document analysis tasks, and (iii) enforcing region-specific feature-learning is unnecessary given sufficient training data. This work also makes available a new labelled subset of the IIT-CDIP collection, containing 400,000 document images across 16 categories, useful for training new CNNs for document analysis

    Improving semi-supervised learning for audio classification with FixMatch

    Get PDF
    Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement

    From on-road to off : transfer learning within a deep convolutional neural network for segmentation and classification of off-road scenes.

    Get PDF
    Real-time road-scene understanding is a challenging computer vision task with recent advances in convolutional neural networks (CNN) achieving results that notably surpass prior traditional feature driven approaches. Here, we take an existing CNN architecture, pre-trained for urban road-scene understanding, and retrain it towards the task of classifying off-road scenes, assessing the network performance within the training cycle. Within the paradigm of transfer learning we analyse the effects on CNN classification, by training and assessing varying levels of prior training on varying sub-sets of our off-road training data. For each of these configurations, we evaluate the network at multiple points during its training cycle, allowing us to analyse in depth exactly how the training process is affected by these variations. Finally, we compare this CNN to a more traditional approach using a feature-driven Support Vector Machine (SVM) classifier and demonstrate state-of-the-art results in this particularly challenging problem of off-road scene understanding

    Brain informed transfer learning for categorizing construction hazards

    Full text link
    A transfer learning paradigm is proposed for "knowledge" transfer between the human brain and convolutional neural network (CNN) for a construction hazard categorization task. Participants' brain activities are recorded using electroencephalogram (EEG) measurements when viewing the same images (target dataset) as the CNN. The CNN is pretrained on the EEG data and then fine-tuned on the construction scene images. The results reveal that the EEG-pretrained CNN achieves a 9 % higher accuracy compared with a network with same architecture but randomly initialized parameters on a three-class classification task. Brain activity from the left frontal cortex exhibits the highest performance gains, thus indicating high-level cognitive processing during hazard recognition. This work is a step toward improving machine learning algorithms by learning from human-brain signals recorded via a commercially available brain-computer interface. More generalized visual recognition systems can be effectively developed based on this approach of "keep human in the loop"

    Illumination Invariant Deep Learning for Hyperspectral Data

    Get PDF
    Motivated by the variability in hyperspectral images due to illumination and the difficulty in acquiring labelled data, this thesis proposes different approaches for learning illumination invariant feature representations and classification models for hyperspectral data captured outdoors, under natural sunlight. The approaches integrate domain knowledge into learning algorithms and hence does not rely on a priori knowledge of atmospheric parameters, additional sensors or large amounts of labelled training data. Hyperspectral sensors record rich semantic information from a scene, making them useful for robotics or remote sensing applications where perception systems are used to gain an understanding of the scene. Images recorded by hyperspectral sensors can, however, be affected to varying degrees by intrinsic factors relating to the sensor itself (keystone, smile, noise, particularly at the limits of the sensed spectral range) but also by extrinsic factors such as the way the scene is illuminated. The appearance of the scene in the image is tied to the incident illumination which is dependent on variables such as the position of the sun, geometry of the surface and the prevailing atmospheric conditions. Effects like shadows can make the appearance and spectral characteristics of identical materials to be significantly different. This degrades the performance of high-level algorithms that use hyperspectral data, such as those that do classification and clustering. If sufficient training data is available, learning algorithms such as neural networks can capture variability in the scene appearance and be trained to compensate for it. Learning algorithms are advantageous for this task because they do not require a priori knowledge of the prevailing atmospheric conditions or data from additional sensors. Labelling of hyperspectral data is, however, difficult and time-consuming, so acquiring enough labelled samples for the learning algorithm to adequately capture the scene appearance is challenging. Hence, there is a need for the development of techniques that are invariant to the effects of illumination that do not require large amounts of labelled data. In this thesis, an approach to learning a representation of hyperspectral data that is invariant to the effects of illumination is proposed. This approach combines a physics-based model of the illumination process with an unsupervised deep learning algorithm, and thus requires no labelled data. Datasets that vary both temporally and spatially are used to compare the proposed approach to other similar state-of-the-art techniques. The results show that the learnt representation is more invariant to shadows in the image and to variations in brightness due to changes in the scene topography or position of the sun in the sky. The results also show that a supervised classifier can predict class labels more accurately and more consistently across time when images are represented using the proposed method. Additionally, this thesis proposes methods to train supervised classification models to be more robust to variations in illumination where only limited amounts of labelled data are available. The transfer of knowledge from well-labelled datasets to poorly labelled datasets for classification is investigated. A method is also proposed for enabling small amounts of labelled samples to capture the variability in spectra across the scene. These samples are then used to train a classifier to be robust to the variability in the data caused by variations in illumination. The results show that these approaches make convolutional neural network classifiers more robust and achieve better performance when there is limited labelled training data. A case study is presented where a pipeline is proposed that incorporates the methods proposed in this thesis for learning robust feature representations and classification models. A scene is clustered using no labelled data. The results show that the pipeline groups the data into clusters that are consistent with the spatial distribution of the classes in the scene as determined from ground truth

    From On-Road to Off: Transfer Learning within a Deep Convolutional Neural Network for Segmentation and Classification of Off-Road Scenes

    Get PDF
    Real-time road-scene understanding is a challenging computer vision task with recent advances in convolutional neural networks (CNN) achieving results that notably surpass prior traditional feature driven approaches. Here, we take an existing CNN architecture, pre-trained for urban road-scene understanding, and retrain it towards the task of classifying off-road scenes, assessing the network performance within the training cycle. Within the paradigm of transfer learning we analyse the effects on CNN classification, by training and assessing varying levels of prior training on varying sub-sets of our off-road training data. For each of these configurations, we evaluate the network at multiple points during its training cycle, allowing us to analyse in depth exactly how the training process is affected by these variations. Finally, we compare this CNN to a more traditional approach using a feature-driven Support Vector Machine (SVM) classifier and demonstrate state-of-the-art results in this particularly challenging problem of off-road scene understanding
    corecore