4,811 research outputs found

    Multi-Label Zero-Shot Learning with Transfer-Aware Label Embedding Projection

    Full text link
    Zero-shot learning transfers knowledge from seen classes to novel unseen classes to reduce human labor of labelling data for building new classifiers. Much effort on zero-shot learning however has focused on the standard multi-class setting, the more challenging multi-label zero-shot problem has received limited attention. In this paper we propose a transfer-aware embedding projection approach to tackle multi-label zero-shot learning. The approach projects the label embedding vectors into a low-dimensional space to induce better inter-label relationships and explicitly facilitate information transfer from seen labels to unseen labels, while simultaneously learning a max-margin multi-label classifier with the projected label embeddings. Auxiliary information can be conveniently incorporated to guide the label embedding projection to further improve label relation structures for zero-shot knowledge transfer. We conduct experiments for zero-shot multi-label image classification. The results demonstrate the efficacy of the proposed approach

    Visual Data Synthesis via GAN for Zero-Shot Video Classification

    Full text link
    Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seen-to-unseen correlation via learning a projection between visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distribution, and commonly suffer from the information degradation issue caused by "heterogeneity gap". In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video feature of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Correlation to overcome information degradation issue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot synthesis procedure with robust guidance signals. Experimental results on four video datasets demonstrate that our approach can improve the zero-shot video classification performance significantly.Comment: 7 pages, accepted by International Joint Conference on Artificial Intelligence (IJCAI) 201

    Zero-Shot Learning via Latent Space Encoding

    Full text link
    Zero-Shot Learning (ZSL) is typically achieved by resorting to a class semantic embedding space to transfer the knowledge from the seen classes to unseen ones. Capturing the common semantic characteristics between the visual modality and the class semantic modality (e.g., attributes or word vector) is a key to the success of ZSL. In this paper, we propose a novel encoder-decoder approach, namely Latent Space Encoding (LSE), to connect the semantic relations of different modalities. Instead of requiring a projection function to transfer information across different modalities like most previous work, LSE per- forms the interactions of different modalities via a feature aware latent space, which is learned in an implicit way. Specifically, different modalities are modeled separately but optimized jointly. For each modality, an encoder-decoder framework is performed to learn a feature aware latent space via jointly maximizing the recoverability of the original space from the latent space and the predictability of the latent space from the original space. To relate different modalities together, their features referring to the same concept are enforced to share the same latent codings. In this way, the common semantic characteristics of different modalities are generalized with the latent representations. Another property of the proposed approach is that it is easily extended to more modalities. Extensive experimental results on four benchmark datasets (AwA, CUB, aPY, and ImageNet) clearly demonstrate the superiority of the proposed approach on several ZSL tasks, including traditional ZSL, generalized ZSL, and zero-shot retrieval (ZSR)

    Deep Residual Output Layers for Neural Language Generation

    Full text link
    Many tasks, including language generation, benefit from learning the structure of the output space, particularly when the space of output labels is large and the data is sparse. State-of-the-art neural language models indirectly capture the output space structure in their classifier weights since they lack parameter sharing across output labels. Learning shared output label mappings helps, but existing methods have limited expressivity and are prone to overfitting. In this paper, we investigate the usefulness of more powerful shared mappings for output labels, and propose a deep residual output mapping with dropout between layers to better capture the structure of the output space and avoid overfitting. Evaluations on three language generation tasks show that our output label mapping can match or improve state-of-the-art recurrent and self-attention architectures, and suggest that the classifier does not necessarily need to be high-rank to better model natural language if it is better at capturing the structure of the output space.Comment: To appear in ICML 201

    Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation

    Full text link
    Tying the weights of the target word embeddings with the target word classifiers of neural machine translation models leads to faster training and often to better translation quality. Given the success of this parameter sharing, we investigate other forms of sharing in between no sharing and hard equality of parameters. In particular, we propose a structure-aware output layer which captures the semantic structure of the output space of words within a joint input-output embedding. The model is a generalized form of weight tying which shares parameters but allows learning a more flexible relationship with input word embeddings and allows the effective capacity of the output layer to be controlled. In addition, the model shares weights across output classifiers and translation contexts which allows it to better leverage prior knowledge about them. Our evaluation on English-to-Finnish and English-to-German datasets shows the effectiveness of the method against strong encoder-decoder baselines trained with or without weight tying.Comment: To appear at WMT 201

    A Novel Perspective to Zero-shot Learning: Towards an Alignment of Manifold Structures via Semantic Feature Expansion

    Full text link
    Zero-shot learning aims at recognizing unseen classes (no training example) with knowledge transferred from seen classes. This is typically achieved by exploiting a semantic feature space shared by both seen and unseen classes, i.e., attribute or word vector, as the bridge. One common practice in zero-shot learning is to train a projection between the visual and semantic feature spaces with labeled seen classes examples. When inferring, this learned projection is applied to unseen classes and recognizes the class labels by some metrics. However, the visual and semantic feature spaces are mutually independent and have quite different manifold structures. Under such a paradigm, most existing methods easily suffer from the domain shift problem and weaken the performance of zero-shot recognition. To address this issue, we propose a novel model called AMS-SFE. It considers the alignment of manifold structures by semantic feature expansion. Specifically, we build upon an autoencoder-based model to expand the semantic features from the visual inputs. Additionally, the expansion is jointly guided by an embedded manifold extracted from the visual feature space of the data. Our model is the first attempt to align both feature spaces by expanding semantic features and derives two benefits: first, we expand some auxiliary features that enhance the semantic feature space; second and more importantly, we implicitly align the manifold structures between the visual and semantic feature spaces; thus, the projection can be better trained and mitigate the domain shift problem. Extensive experiments show significant performance improvement, which verifies the effectiveness of our model

    Transfer Adaptation Learning: A Decade Survey

    Full text link
    The world we see is ever-changing and it always changes with people, things, and the environment. Domain is referred to as the state of the world at a certain moment. A research problem is characterized as transfer adaptation learning (TAL) when it needs knowledge correspondence between different moments/domains. Conventional machine learning aims to find a model with the minimum expected risk on test data by minimizing the regularized empirical risk on the training data, which, however, supposes that the training and test data share similar joint probability distribution. TAL aims to build models that can perform tasks of target domain by learning knowledge from a semantic related but distribution different source domain. It is an energetic research filed of increasing influence and importance, which is presenting a blowout publication trend. This paper surveys the advances of TAL methodologies in the past decade, and the technical challenges and essential problems of TAL have been observed and discussed with deep insights and new perspectives. Broader solutions of transfer adaptation learning being created by researchers are identified, i.e., instance re-weighting adaptation, feature adaptation, classifier adaptation, deep network adaptation and adversarial adaptation, which are beyond the early semi-supervised and unsupervised split. The survey helps researchers rapidly but comprehensively understand and identify the research foundation, research status, theoretical limitations, future challenges and under-studied issues (universality, interpretability, and credibility) to be broken in the field toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure

    Zero-Shot Kernel Learning

    Full text link
    In this paper, we address an open problem of zero-shot learning. Its principle is based on learning a mapping that associates feature vectors extracted from i.e. images and attribute vectors that describe objects and/or scenes of interest. In turns, this allows classifying unseen object classes and/or scenes by matching feature vectors via mapping to a newly defined attribute vector describing a new class. Due to importance of such a learning task, there exist many methods that learn semantic, probabilistic, linear or piece-wise linear mappings. In contrast, we apply well-established kernel methods to learn a non-linear mapping between the feature and attribute spaces. We propose an easy learning objective inspired by the Linear Discriminant Analysis, Kernel-Target Alignment and Kernel Polarization methods that promotes incoherence. We evaluate performance of our algorithm on the Polynomial as well as shift-invariant Gaussian and Cauchy kernels. Despite simplicity of our approach, we obtain state-of-the-art results on several zero-shot learning datasets and benchmarks including a recent AWA2 dataset.Comment: IEEE Conference on Computer Vision and Pattern Recognition 201

    Zero-Shot Object Detection

    Full text link
    We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD. We then discuss the problems associated with selecting a background class and motivate two background-aware approaches for learning robust detectors. One of these models uses a fixed background class and the other is based on iterative latent assignments. We also outline the challenge associated with using a limited number of training classes and propose a solution based on dense sampling of the semantic label space using auxiliary data with a large number of categories. We propose novel splits of two standard detection datasets - MSCOCO and VisualGenome, and present extensive empirical results in both the traditional and generalized zero-shot settings to highlight the benefits of the proposed methods. We provide useful insights into the algorithm and conclude by posing some open questions to encourage further research.Comment: 17 pages. ECCV 201

    Leveraging Semantic Embeddings for Safety-Critical Applications

    Full text link
    Semantic Embeddings are a popular way to represent knowledge in the field of zero-shot learning. We observe their interpretability and discuss their potential utility in a safety-critical context. Concretely, we propose to use them to add introspection and error detection capabilities to neural network classifiers. First, we show how to create embeddings from symbolic domain knowledge. We discuss how to use them for interpreting mispredictions and propose a simple error detection scheme. We then introduce the concept of semantic distance: a real-valued score that measures confidence in the semantic space. We evaluate this score on a traffic sign classifier and find that it achieves near state-of-the-art performance, while being significantly faster to compute than other confidence scores. Our approach requires no changes to the original network and is thus applicable to any task for which domain knowledge is available.Comment: Accepted at CVPR 2019 Workshop: Safe Artificial Intelligence for Automated Drivin
    • …
    corecore