Search CORE

13,587 research outputs found

Exact Dimensionality Selection for Bayesian PCA

Author: Bouveyron Charles
Latouche Pierre
Mattei Pierre-Alexandre
Publication venue
Publication date: 21/05/2019
Field of study

We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Bayesian dimensionality reduction with PCA using penalized semi-integrated likelihood

Author: Bogdan Malgorzata
Josse Julie
Sobczyk Piotr
Publication venue
Publication date: 01/01/2016
Field of study

We discuss the problem of estimating the number of principal components in Principal Com- ponents Analysis (PCA). Despite of the importance of the problem and the multitude of solutions proposed in the literature, it comes as a surprise that there does not exist a coherent asymptotic framework which would justify different approaches depending on the actual size of the data set. In this paper we address this issue by presenting an approximate Bayesian approach based on Laplace approximation and introducing a general method for building the model selection criteria, called PEnalized SEmi-integrated Likelihood (PESEL). Our general framework encompasses a variety of existing approaches based on probabilistic models, like e.g. Bayesian Information Criterion for the Probabilistic PCA (PPCA), and allows for construction of new criteria, depending on the size of the data set at hand. Specifically, we define PESEL when the number of variables substantially exceeds the number of observations. We also report results of extensive simulation studies and real data analysis, which illustrate good properties of our proposed criteria as compared to the state-of- the-art methods and very recent proposals. Specifially, these simulations show that PESEL based criteria can be quite robust against deviations from the probabilistic model assumptions. Selected PESEL based criteria for the estimation of the number of principal components are implemented in R package varclust, which is available on github (https://github.com/psobczyk/varclust).Comment: 31 pages, 7 figure

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

FigShare

A group model for stable multi-subject ICA on fMRI datasets

Author: Kleinschmidt A.
Pinel P.
Poline J. B.
Sadaghiani S.
Thirion B.
Varoquaux G.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract sets of mutually correlated brain regions without prior information on the time course of these regions. Some of these sets of regions, interpreted as functional networks, have recently been used to provide markers of brain diseases and open the road to paradigm-free population comparisons. Such group studies raise the question of modeling subject variability within ICA: how can the patterns representative of a group be modeled and estimated via ICA for reliable inter-group comparisons? In this paper, we propose a hierarchical model for patterns in multi-subject fMRI datasets, akin to mixed-effect group models used in linear-model-based analysis. We introduce an estimation procedure, CanICA (Canonical ICA), based on i) probabilistic dimension reduction of the individual data, ii) canonical correlation analysis to identify a data subspace common to the group iii) ICA-based pattern extraction. In addition, we introduce a procedure based on cross-validation to quantify the stability of ICA patterns at the level of the group. We compare our method with state-of-the-art multi-subject fMRI ICA methods and show that the features extracted using our procedure are more reproducible at the group level on two datasets of 12 healthy controls: a resting-state and a functional localizer study

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Inserm

HAL-CEA

Determining Principal Component Cardinality through the Principle of Minimum Description Length

Author: A Blumer
AJ Donald
AP Dawid
AR Barron
C Eckart
DC Hoyle
IT Jolliffe
J Josse
J Rissanen
J Rissanen
J Rissanen
J Rissanen
JI Myung
M Mitzenmacher
M Zhu
MH Hansen
T Hastie
TM Cover
Y Choi
Publication venue
Publication date: 29/06/2019
Field of study

PCA (Principal Component Analysis) and its variants areubiquitous techniques for matrix dimension reduction and reduced-dimensionlatent-factor extraction. One significant challenge in using PCA, is thechoice of the number of principal components. The information-theoreticMDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically applyits modern definition - NML (Normalized Maximum Likelihood) - to theproblem of PCA. This work shows a general reduction of NML prob-lems to lower-dimension problems. Applying this reduction, it boundsthe NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201

arXiv.org e-Print Archive

Crossref

Probabilistic classification of acute myocardial infarction from multiple cardiac markers

Author: C Bishop
F Dombal de
F Fesmire
F Fesmire
George W. Irwin
H Selker
J Ellenuis
J Habbema
J Habbema
J Habbema
J Hanley
J Hilden
J Hilden
John V. Lamont
L Goldman
L Goldman
M Pozen
Paul C. Wilson
Robert F. Harrison
T Groth
W Tierney
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2008
Field of study

Logistic regression and Gaussian mixture model (GMM) classifiers have been trained to estimate the probability of acute myocardial infarction (AMI) in patients based upon the concentrations of a panel of cardiac markers. The panel consists of two new markers, fatty acid binding protein (FABP) and glycogen phosphorylase BB (GPBB), in addition to the traditional cardiac troponin I (cTnI), creatine kinase MB (CKMB) and myoglobin. The effect of using principal component analysis (PCA) and Fisher discriminant analysis (FDA) to preprocess the marker concentrations was also investigated. The need for classifiers to give an accurate estimate of the probability of AMI is argued and three categories of performance measure are described, namely discriminatory ability, sharpness, and reliability. Numerical performance measures for each category are given and applied. The optimum classifier, based solely upon the samples take on admission, was the logistic regression classifier using FDA preprocessing. This gave an accuracy of 0.85 (95% confidence interval: 0.78–0.91) and a normalised Brier score of 0.89. When samples at both admission and a further time, 1–6 h later, were included, the performance increased significantly, showing that logistic regression classifiers can indeed use the information from the five cardiac markers to accurately and reliably estimate the probability AMI

Queen's University Belfast Research Portal

Crossref

White Rose Research Online