Search CORE

1,058 research outputs found

AutoSense Model for Word Sense Induction

Author: Amplayo Reinald Kim
Hwang Seung-won
Song Min
Publication venue
Publication date: 22/11/2018
Field of study

Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing fine-grained word senses. Results show great improvements over the state-of-the-art models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://github.com/rktamplayo/AutoSense.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A supervised clustering approach for fMRI-based inference of brain states

Author: Alexandre Gramfort
Bertrand Thirion
Bishop
Carroll
Christine Keribin
Cordes
Cortes
Cox
Dayan
Eger
Evelyn Eger
Fan
Filzmoser
Flandin
Friedman
Friston
Gaël Varoquaux
Ghebreab
Golland
Haynes
Haynes
He
Hughes
Johnson
Kamitani
Keller
Kontos
Kriegeskorte
Krishnapuram
Mitchell
Norman
Oliver
Palatucci
Thirion
Thyreau
Tucholka
Tzourio-Mazoyer
Ugurbil
Vincent Michel
Ward
Zou
Publication venue: 'Elsevier BV'
Publication date: 20/04/2011
Field of study

We propose a method that combines signals from many brain regions observed in functional Magnetic Resonance Imaging (fMRI) to predict the subject's behavior during a scanning session. Such predictions suffer from the huge number of brain regions sampled on the voxel grid of standard fMRI data sets: the curse of dimensionality. Dimensionality reduction is thus needed, but it is often performed using a univariate feature selection procedure, that handles neither the spatial structure of the images, nor the multivariate nature of the signal. By introducing a hierarchical clustering of the brain volume that incorporates connectivity constraints, we reduce the span of the possible spatial configurations to a single tree of nested regions tailored to the signal. We then prune the tree in a supervised setting, hence the name supervised clustering, in order to extract a parcellation (division of the volume) such that parcel-based signal averages best predict the target information. Dimensionality reduction is thus achieved by feature agglomeration, and the constructed features now provide a multi-scale representation of the signal. Comparisons with reference methods on both simulated and real data show that our approach yields higher prediction accuracy than standard voxel-based approaches. Moreover, the method infers an explicit weighting of the regions involved in the regression or classification task

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Inserm

HAL-CEA

Calibrated model-based evidential clustering using bootstrapping

Author: Denoeux Thierry
Publication venue: 'Elsevier BV'
Publication date: 05/04/2020
Field of study

Evidential clustering is an approach to clustering in which cluster-membership uncertainty is represented by a collection of Dempster-Shafer mass functions forming an evidential partition. In this paper, we propose to construct these mass functions by bootstrapping finite mixture models. In the first step, we compute bootstrap percentile confidence intervals for all pairwise probabilities (the probabilities for any two objects to belong to the same class). We then construct an evidential partition such that the pairwise belief and plausibility degrees approximate the bounds of the confidence intervals. This evidential partition is calibrated, in the sense that the pairwise belief-plausibility intervals contain the true probabilities "most of the time", i.e., with a probability close to the defined confidence level. This frequentist property is verified by simulation, and the practical applicability of the method is demonstrated using several real datasets

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

A review of model designs

Author: Nijboer R.C.
Verdonschot P.F.M.
Publication venue: Alterra
Publication date
Field of study

The PAEQANN project aims to review current ecological theories which can help identify suited models that predict community structure in aquatic ecosystems, to select and discuss appropriate models, depending on the type of target community (i.e. empirical vs. simulation models) and to examine how results add to ecological water management objectives. To reach these goals a number of classical statistical models, artificial neural networks and dynamic models are presented. An even higher number of techniques within these groups will tested lateron in the project. This report introduces all of them. The techniques are shortly introduced, their algorithms explained, and the advantages and disadvantages discussed

Wageningen University & Research Publications

Recent Developments in Document Clustering

Author: Andrews Nicholas O.
Fox Edward A.
Publication venue
Publication date: 01/10/2007
Field of study

This report aims to give a brief overview of the current state of document clustering research and present recent developments in a well-organized manner. Clustering algorithms are considered with two hypothetical scenarios in mind: online query clustering with tight efficiency constraints, and offline clustering with an emphasis on accuracy. A comparative analysis of the algorithms is performed along with a table summarizing important properties, and open problems as well as directions for future research are discussed

Computer Science Technical Reports @Virginia Tech

Generalized topographic block model

Author: Govaert Gérard
Nadif Mohamed
Priam Rodolphe
Publication venue: 'Elsevier BV'
Publication date
Field of study

Co-clustering leads to parsimony in data visualisation with a number of parameters dramatically reduced in comparison to the dimensions of the data sample. Herein, we propose a new generalized approach for nonlinear mapping by a re-parameterization of the latent block mixture model. The densities modeling the blocks are in an exponential family such that the Gaussian, Bernoulli and Poisson laws are particular cases. The inference of the parameters is derived from the block expectation–maximization algorithm with a Newton–Raphson procedure at the maximization step. Empirical experiments with textual data validate the interest of our generalized model

Southampton (e-Prints Soton)