Search CORE

115,818 research outputs found

Generative Supervised Classification Using Dirichlet Process Priors.

Author: Davy Manuel
Tourneret Jean-Yves
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2010
Field of study

Choosing the appropriate parameter prior distributions associated to a given Bayesian model is a challenging problem. Conjugate priors can be selected for simplicity motivations. However, conjugate priors can be too restrictive to accurately model the available prior information. This paper studies a new generative supervised classifier which assumes that the parameter prior distributions conditioned on each class are mixtures of Dirichlet processes. The motivations for using mixtures of Dirichlet processes is their known ability to model accurately a large class of probability distributions. A Monte Carlo method allowing one to sample according to the resulting class-conditional posterior distributions is then studied. The parameters appearing in the class-conditional densities can then be estimated using these generated samples (following Bayesian learning). The proposed supervised classifier is applied to the classification of altimetric waveforms backscattered from different surfaces (oceans, ices, forests, and deserts). This classification is a first step before developing tools allowing for the extraction of useful geophysical information from altimetric waveforms backscattered from nonoceanic surfaces

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

Author: Andrea Bellincontro
Fabio Mencarelli
Fordellone Mario
Publication venue: Associazione per la statistica applicata
Publication date: 01/01/2018
Field of study

The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches

Archivio della ricerca- Università di Roma La Sapienza

Manifold embedding for curve registration

Author: Dimeglio Chloé
Loubes Jean-Michel
Maza Elie
Publication venue
Publication date: 01/01/2011
Field of study

We focus on the problem of finding a good representative of a sample of random curves warped from a common pattern f. We first prove that such a problem can be moved onto a manifold framework. Then, we propose an estimation of the common pattern f based on an approximated geodesic distance on a suitable manifold. We then compare the proposed method to more classical methods

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Hal-Diderot

Semi-parametric estimation of shifts

Author: Gamboa Fabrice
Loubes Jean-Michel
Maza Elie
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

We observe a large number of functions differing from each other only by a translation parameter. While the main pattern is unknown, we propose to estimate the shift parameters using

M

-estimators. Fourier transform enables to transform this statistical problem into a semi-parametric framework. We study the convergence of the estimator and provide its asymptotic behavior. Moreover, we use the method in the applied case of velocity curve forecasting.Comment: Published in at http://dx.doi.org/10.1214/07-EJS026 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

ProdInra

Hal-Diderot

Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data

Author: Bellincontro Andrea
Fordellone Mario
Mencarelli Fabio
Publication venue
Publication date: 01/01/2018
Field of study

The recent development of more sophisticated spectroscopic methods allows acqui- sition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches

arXiv.org e-Print Archive

Unitus DSpace

Archivio della ricerca- Università di Roma La Sapienza

An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

Author: Malley James
Strobl Carolin
Tutz Gerhard
Publication venue
Publication date: 01/04/2009
Field of study

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

Crossref

Open Access LMU

PubMed Central

Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

Author: Bianchi Filippo Maria
Jenssen Robert
Mikalsen Karl Øyvind
Soguero-Ruiz Cristina
Publication venue
Publication date: 01/01/2017
Field of study

Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Author: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 10/09/2012
Field of study

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref