21,755 research outputs found

    Text categorization using feature projections

    Full text link

    Cross-lingual Distillation for Text Classification

    Full text link
    Cross-lingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. This paper presents a novel approach to CLTC that builds on model distillation, which adapts and extends a framework originally proposed for model compression. Using soft probabilistic predictions for the documents in a label-rich language as the (induced) supervisory labels in a parallel corpus of documents, we train classifiers successfully for new languages in which labeled training data are not available. An adversarial feature adaptation technique is also applied during the model training to reduce distribution mismatch. We conducted experiments on two benchmark CLTC datasets, treating English as the source language and German, French, Japan and Chinese as the unlabeled target languages. The proposed approach had the advantageous or comparable performance of the other state-of-art methods.Comment: Accepted at ACL 2017; Code available at https://github.com/xrc10/cross-distil

    Similarity Learning for High-Dimensional Sparse Data

    Get PDF
    A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.Comment: 14 pages. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015). Matlab code: https://github.com/bellet/HDS

    Decoding face categories in diagnostic subregions of primary visual cortex

    Get PDF
    Higher visual areas in the occipitotemporal cortex contain discrete regions for face processing, but it remains unclear if V1 is modulated by top-down influences during face discrimination, and if this is widespread throughout V1 or localized to retinotopic regions processing task-relevant facial features. Employing functional magnetic resonance imaging (fMRI), we mapped the cortical representation of two feature locations that modulate higher visual areas during categorical judgements – the eyes and mouth. Subjects were presented with happy and fearful faces, and we measured the fMRI signal of V1 regions processing the eyes and mouth whilst subjects engaged in gender and expression categorization tasks. In a univariate analysis, we used a region-of-interest-based general linear model approach to reveal changes in activation within these regions as a function of task. We then trained a linear pattern classifier to classify facial expression or gender on the basis of V1 data from ‘eye’ and ‘mouth’ regions, and from the remaining non-diagnostic V1 region. Using multivariate techniques, we show that V1 activity discriminates face categories both in local ‘diagnostic’ and widespread ‘non-diagnostic’ cortical subregions. This indicates that V1 might receive the processed outcome of complex facial feature analysis from other cortical (i.e. fusiform face area, occipital face area) or subcortical areas (amygdala)
    • …
    corecore