Search CORE

58,242 research outputs found

Topic-based mixture language modelling

Author: Gotoh Y.
Renals S.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/1999
Field of study

This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost

CiteSeerX

Crossref

Edinburgh Research Archive

White Rose Research Online

High-Dimensional Data Clustering

Author: Agrawal
Banfield
Bellman
Bezdek
Bocci
Bock
Bock
Bock
C. Bouveyron
C. Schmid
Cattell
Celeux
Celeux
De Soete
Demartines
Dempster
DeSarbo
Diday
Flury
Flury
Fraley
Girard
Guyon
Hastie
Jain
Jolliffe
Kohonen
Krzanowski
Lehoucq
McLachlan
McLachlan
McLachlan
Parsons
Pavlenko
Pavlenko
Quandt
Raftery
Roweis
S. Girard
Schott
Schwarz
Schölkopf
Scott
Tenenbaum
Tipping
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that high-dimensional data usually live in different low-dimensional subspaces hidden in the original space. This paper presents a family of Gaussian mixture models designed for high-dimensional data which combine the ideas of dimension reduction and parsimonious modeling. These models give rise to a clustering method based on the Expectation-Maximization algorithm which is called High-Dimensional Data Clustering (HDDC). In order to correctly fit the data, HDDC estimates the specific subspace and the intrinsic dimension of each group. Our experiments on artificial and real datasets show that HDDC outperforms existing methods for clustering high-dimensional dat

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Classification des données de grande dimension: application à la vision par ordinateur

Author: Bouveyron Charles
Girard Stéphane
Schmid Cordelia
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

National audienceClustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that high-dimensional data usually live in different low-dimensional subspaces hidden in the original space. This paper presents a family of Gaussian mixture models designed for high-dimensional data which combine the ideas of dimension reduction and parsimonious modeling. These models give rise to a clustering method based on the Expectation-Maximization algorithm which is called High-Dimensional Data Clustering (HDDC). In order to correctly fit the data, HDDC estimates the specific subspace and the intrinsic dimension of each group. Our experiments on artificial and real datasets show that HDDC outperforms existing methods for clustering high-dimensional data

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Paris1

mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models

Author: Fop Michael
Murphy Thomas Brendan
Raftery Adrian E.
Scrucca Luca
Publication venue: R Foundation for Statistical Computing
Publication date: 01/08/2016
Field of study

Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of purposes of analysis. Recently, version 5 of the package has been made available on CRAN. This updated version adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.Science Foundation Irelan

Research Repository UCD

Irish Universities

PubMed Central