Search CORE

43,339 research outputs found

Distinguishing Word Senses in Untagged Text

Author: Bruce Rebecca
Pedersen Ted
Publication venue
Publication date: 01/01/1997
Field of study

This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st

arXiv.org e-Print Archive

CiteSeerX

Mixtures of Common Skew-t Factor Analyzers

Author: Browne Ryan P.
McNicholas Paul D.
Murray Paula M.
Publication venue: 'Wiley'
Publication date: 30/08/2013
Field of study

A mixture of common skew-t factor analyzers model is introduced for model-based clustering of high-dimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is too large to be well-modelled by the mixtures of factor analyzers model or a variant thereof. Furthermore, assuming that the component densities follow a skew-t distribution allows robust clustering of skewed data. The alternating expectation-conditional maximization algorithm is employed for parameter estimation. We demonstrate excellent clustering performance when our model is applied to real and simulated data.This paper marks the first time that skewed common factors have been used

arXiv.org e-Print Archive

CiteSeerX