101,558 research outputs found

    Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization

    Get PDF
    Semantic specialization is the process of fine-tuning pre-trained distributional word vectors using external lexical knowledge (e.g., WordNet) to accentuate a particular semantic relation in the specialized vector space. While post-processing specialization methods are applicable to arbitrary distributional vectors, they are limited to updating only the vectors of words occurring in external lexicons (i.e., seen words), leaving the vectors of all other words unchanged. We propose a novel approach to specializing the full distributional vocabulary. Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space. We exploit words seen in the resources as training examples for learning a global specialization function. This function is learned by combining a standard L2-distance loss with an adversarial loss: the adversarial component produces more realistic output vectors. We show the effectiveness and robustness of the proposed method across three languages and on three tasks: word similarity, dialog state tracking, and lexical simplification. We report consistent improvements over distributional word vectors and vectors specialized by other state-of-the-art specialization frameworks. Finally, we also propose a cross-lingual transfer method for zero-shot specialization which successfully specializes a full target distributional space without any lexical knowledge in the target language and without any bilingual data.Comment: Accepted at EMNLP 201

    Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization

    Full text link
    When approaching a novel visual recognition problem in a specialized image domain, a common strategy is to start with a pre-trained deep neural network and fine-tune it to the specialized domain. If the target domain covers a smaller visual space than the source domain used for pre-training (e.g. ImageNet), the fine-tuned network is likely to be over-parameterized. However, applying network pruning as a post-processing step to reduce the memory requirements has drawbacks: fine-tuning and pruning are performed independently; pruning parameters are set once and cannot adapt over time; and the highly parameterized nature of state-of-the-art pruning methods make it prohibitive to manually search the pruning parameter space for deep networks, leading to coarse approximations. We propose a principled method for jointly fine-tuning and compressing a pre-trained convolutional network that overcomes these limitations. Experiments on two specialized image domains (remote sensing images and describable textures) demonstrate the validity of the proposed approach.Comment: BMVC 2017 ora

    Stacking-based Deep Neural Network: Deep Analytic Network on Convolutional Spectral Histogram Features

    Full text link
    Stacking-based deep neural network (S-DNN), in general, denotes a deep neural network (DNN) resemblance in terms of its very deep, feedforward network architecture. The typical S-DNN aggregates a variable number of individually learnable modules in series to assemble a DNN-alike alternative to the targeted object recognition tasks. This work likewise devises an S-DNN instantiation, dubbed deep analytic network (DAN), on top of the spectral histogram (SH) features. The DAN learning principle relies on ridge regression, and some key DNN constituents, specifically, rectified linear unit, fine-tuning, and normalization. The DAN aptitude is scrutinized on three repositories of varying domains, including FERET (faces), MNIST (handwritten digits), and CIFAR10 (natural objects). The empirical results unveil that DAN escalates the SH baseline performance over a sufficiently deep layer.Comment: 5 page

    Theory of mind in utterance interpretation: the case from clinical pragmatics

    Get PDF
    The cognitive basis of utterance interpretation is an area that continues to provoke intense theoretical debate among pragmatists. That utterance interpretation involves some type of mind-reading or theory of mind (ToM) is indisputable. However, theorists are divided on the exact nature of this ToM-based mechanism. In this paper, it is argued that the only type of ToM-based mechanism that can adequately represent the cognitive basis of utterance interpretation is one which reflects the rational, intentional, holistic character of interpretation. Such a ToM-based mechanism is supported on conceptual and empirical grounds. Empirical support for this view derives from the study of children and adults with pragmatic disorders. Specifically, three types of clinical case are considered. In the first case, evidence is advanced which indicates that individuals with pragmatic disorders exhibit deficits in reasoning and the use of inferences. These deficits compromise the ability of children and adults with pragmatic disorders to comply with the rational dimension of utterance interpretation

    Learning the Structure of Deep Sparse Graphical Models

    Full text link
    Deep belief networks are a powerful way to model complex probability distributions. However, learning the structure of a belief network, particularly one with hidden units, is difficult. The Indian buffet process has been used as a nonparametric Bayesian prior on the directed structure of a belief network with a single infinitely wide hidden layer. In this paper, we introduce the cascading Indian buffet process (CIBP), which provides a nonparametric prior on the structure of a layered, directed belief network that is unbounded in both depth and width, yet allows tractable inference. We use the CIBP prior with the nonlinear Gaussian belief network so each unit can additionally vary its behavior between discrete and continuous representations. We provide Markov chain Monte Carlo algorithms for inference in these belief networks and explore the structures learned on several image data sets.Comment: 20 pages, 6 figures, AISTATS 2010, Revise
    • …
    corecore