15,970 research outputs found

    Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

    Get PDF
    Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL

    Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery

    Full text link
    Fine-grained object recognition that aims to identify the type of an object among a large number of subcategories is an emerging application with the increasing resolution that exposes new details in image data. Traditional fully supervised algorithms fail to handle this problem where there is low between-class variance and high within-class variance for the classes of interest with small sample sizes. We study an even more extreme scenario named zero-shot learning (ZSL) in which no training example exists for some of the classes. ZSL aims to build a recognition model for new unseen categories by relating them to seen classes that were previously learned. We establish this relation by learning a compatibility function between image features extracted via a convolutional neural network and auxiliary information that describes the semantics of the classes of interest by using training samples from the seen classes. Then, we show how knowledge transfer can be performed for the unseen classes by maximizing this function during inference. We introduce a new data set that contains 40 different types of street trees in 1-ft spatial resolution aerial data, and evaluate the performance of this model with manually annotated attributes, a natural language model, and a scientific taxonomy as auxiliary information. The experiments show that the proposed model achieves 14.3% recognition accuracy for the classes with no training examples, which is significantly better than a random guess accuracy of 6.3% for 16 test classes, and three other ZSL algorithms.Comment: G. Sumbul, R. G. Cinbis, S. Aksoy, "Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery", IEEE Transactions on Geoscience and Remote Sensing (TGRS), in press, 201

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly

    Learning models for semantic classification of insufficient plantar pressure images

    Get PDF
    Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)- based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally, the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H) and time (training and evaluation). The proposed method for the plantar pressure classification task shows high performance in most indices when comparing with other methods. The transfer learning-based method can be applied to other insufficient data-sets of sensor imaging fields
    • …
    corecore