Search CORE

101,558 research outputs found

Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization

Author: Glavaš Goran
Korhonen Anna
Mrkšić Nikola
Ponti Edoardo Maria
Vulić Ivan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Semantic specialization is the process of fine-tuning pre-trained distributional word vectors using external lexical knowledge (e.g., WordNet) to accentuate a particular semantic relation in the specialized vector space. While post-processing specialization methods are applicable to arbitrary distributional vectors, they are limited to updating only the vectors of words occurring in external lexicons (i.e., seen words), leaving the vectors of all other words unchanged. We propose a novel approach to specializing the full distributional vocabulary. Our adversarial post-specialization method propagates the external lexical knowledge to the full distributional space. We exploit words seen in the resources as training examples for learning a global specialization function. This function is learned by combining a standard L2-distance loss with an adversarial loss: the adversarial component produces more realistic output vectors. We show the effectiveness and robustness of the proposed method across three languages and on three tasks: word similarity, dialog state tracking, and lexical simplification. We report consistent improvements over distributional word vectors and vectors specialized by other state-of-the-art specialization frameworks. Finally, we also propose a cross-lingual transfer method for zero-shot specialization which successfully specializes a full target distributional space without any lexical knowledge in the target language and without any bilingual data.Comment: Accepted at EMNLP 201

arXiv.org e-Print Archive

MAnnheim DOCument Server

Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization

Author: Mori Greg
Muralidharan Srikanth
Tung Frederick
Publication venue
Publication date: 01/01/2017
Field of study

When approaching a novel visual recognition problem in a specialized image domain, a common strategy is to start with a pre-trained deep neural network and fine-tune it to the specialized domain. If the target domain covers a smaller visual space than the source domain used for pre-training (e.g. ImageNet), the fine-tuned network is likely to be over-parameterized. However, applying network pruning as a post-processing step to reduce the memory requirements has drawbacks: fine-tuning and pruning are performed independently; pruning parameters are set once and cannot adapt over time; and the highly parameterized nature of state-of-the-art pruning methods make it prohibitive to manually search the pruning parameter space for deep networks, leading to coarse approximations. We propose a principled method for jointly fine-tuning and compressing a pre-trained convolutional network that overcomes these limitations. Experiments on two specialized image domains (remote sensing images and describable textures) demonstrate the validity of the proposed approach.Comment: BMVC 2017 ora

arXiv.org e-Print Archive

Stacking-based Deep Neural Network: Deep Analytic Network on Convolutional Spectral Histogram Features

Author: Low Cheng-Yaw
Teoh Andrew Beng-Jin
Publication venue
Publication date: 21/05/2017
Field of study

Stacking-based deep neural network (S-DNN), in general, denotes a deep neural network (DNN) resemblance in terms of its very deep, feedforward network architecture. The typical S-DNN aggregates a variable number of individually learnable modules in series to assemble a DNN-alike alternative to the targeted object recognition tasks. This work likewise devises an S-DNN instantiation, dubbed deep analytic network (DAN), on top of the spectral histogram (SH) features. The DAN learning principle relies on ridge regression, and some key DNN constituents, specifically, rectified linear unit, fine-tuning, and normalization. The DAN aptitude is scrutinized on three repositories of varying domains, including FERET (faces), MNIST (handwritten digits), and CIFAR10 (natural objects). The empirical results unveil that DAN escalates the SH baseline performance over a sufficiently deep layer.Comment: 5 page

arXiv.org e-Print Archive

Theory of mind in utterance interpretation: the case from clinical pragmatics

Author: Adams
Andreou
Armstrong
Averback
Barnes
Barnes
Berthiaume
Bezuidenhout
Bigham
Botting
Corcoran
Cummings
Cummings
Cummings
Cummings
Cummings
Cummings
Cummings
Cummings
Dennis
Dennis
Dennis
Duval
El Hachioui
Grindrod
Henry
Ho
Jolliffe
Kasher
Kasher
Kasher
Kasher
Katsos
Krawczyk
Krawczyk
Laakso
Lehman-Blake
Leroy
Leroy
Loukusa
Mazza
McInnes
McNamara
Mirian
Montag
Moran
Morsanyi
Newton
Norbury
Oliver
Philip
Pijnacker
Porter
Putnam
Putnam
Putnam
Recanati
Ryder
Searle
Shamay-Tsoory
Simpson
Sperber
Sperber
Titone
Titone
Tompkins
Tompkins
Tompkins
Tényi
Vartanian
Waechter
Wilson
Wilson
Wittgenstein
Yoshiura
Ziatas
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

The cognitive basis of utterance interpretation is an area that continues to provoke intense theoretical debate among pragmatists. That utterance interpretation involves some type of mind-reading or theory of mind (ToM) is indisputable. However, theorists are divided on the exact nature of this ToM-based mechanism. In this paper, it is argued that the only type of ToM-based mechanism that can adequately represent the cognitive basis of utterance interpretation is one which reflects the rational, intentional, holistic character of interpretation. Such a ToM-based mechanism is supported on conceptual and empirical grounds. Empirical support for this view derives from the study of children and adults with pragmatic disorders. Specifically, three types of clinical case are considered. In the first case, evidence is advanced which indicates that individuals with pragmatic disorders exhibit deficits in reasoning and the use of inferences. These deficits compromise the ability of children and adults with pragmatic disorders to comply with the rational dimension of utterance interpretation

Directory of Open Access Journals

Frontiers - Publisher Connector

Learning the Structure of Deep Sparse Graphical Models

Author: Adams Ryan Prescott
Ghahramani Zoubin
Wallach Hanna M.
Publication venue
Publication date: 01/01/2010
Field of study

Deep belief networks are a powerful way to model complex probability distributions. However, learning the structure of a belief network, particularly one with hidden units, is difficult. The Indian buffet process has been used as a nonparametric Bayesian prior on the directed structure of a belief network with a single infinitely wide hidden layer. In this paper, we introduce the cascading Indian buffet process (CIBP), which provides a nonparametric prior on the structure of a layered, directed belief network that is unbounded in both depth and width, yet allows tractable inference. We use the CIBP prior with the nonlinear Gaussian belief network so each unit can additionally vary its behavior between discrete and continuous representations. We provide Markov chain Monte Carlo algorithms for inference in these belief networks and explore the structures learned on several image data sets.Comment: 20 pages, 6 figures, AISTATS 2010, Revise

arXiv.org e-Print Archive