43,339 research outputs found
Distinguishing Word Senses in Untagged Text
This paper describes an experimental comparison of three unsupervised
learning algorithms that distinguish the sense of an ambiguous word in untagged
text. The methods described in this paper, McQuitty's similarity analysis,
Ward's minimum-variance method, and the EM algorithm, assign each instance of
an ambiguous word to a known sense definition based solely on the values of
automatically identifiable features in text. These methods and feature sets are
found to be more successful in disambiguating nouns rather than adjectives or
verbs. Overall, the most accurate of these procedures is McQuitty's similarity
analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st
Mixtures of Common Skew-t Factor Analyzers
A mixture of common skew-t factor analyzers model is introduced for
model-based clustering of high-dimensional data. By assuming common component
factor loadings, this model allows clustering to be performed in the presence
of a large number of mixture components or when the number of dimensions is too
large to be well-modelled by the mixtures of factor analyzers model or a
variant thereof. Furthermore, assuming that the component densities follow a
skew-t distribution allows robust clustering of skewed data. The alternating
expectation-conditional maximization algorithm is employed for parameter
estimation. We demonstrate excellent clustering performance when our model is
applied to real and simulated data.This paper marks the first time that skewed
common factors have been used
- …