Search CORE

4,439 research outputs found

Machine Learning for Neuroimaging with Scikit-Learn

Author: Abraham Alexandre
Eickenberg Michael
Gervais Philippe
Gramfort Alexandre
Kossaifi Jean
Muller Andreas
Pedregosa Fabian
Thirion Bertrand
Varoquaux Gäel
Publication venue
Publication date: 01/12/2013
Field of study

Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.Comment: Frontiers in neuroscience, Frontiers Research Foundation, 2013, pp.1

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Frontiers - Publisher Connector

PubMed Central

HAL-CEA

Polyglot: Distributed Word Representations for Multilingual NLP

Author: Al-Rfou Rami
Perozzi Bryan
Skiena Steven
Publication venue
Publication date: 27/06/2014
Field of study

Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'201

arXiv.org e-Print Archive

CiteSeerX

Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

Author: El-Yaniv Ran
Yanay David
Publication venue
Publication date: 10/11/2013
Field of study

We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated with textual units of a large background knowledge corpus. We present an efficient algorithm for learning such semantic models from a training sample of relatedness preferences. Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts. Moreover, the approach facilitates the fitting of semantic models for specific users or groups of users. We present the results of extensive range of experiments from small to large scale, indicating that the proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already published at ECML/PKDD 201

arXiv.org e-Print Archive

CiteSeerX

Low-shot learning with large-scale diffusion

Author: Douze Matthijs
Hariharan Bharath
Jégou Hervé
Szlam Arthur
Publication venue
Publication date: 15/06/2018
Field of study

This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime

arXiv.org e-Print Archive

Crossref