Search CORE

867 research outputs found

The alternating least-squares algorithm for CDPCA

Author: A d’Aspremont
H Zou
IT Jolliffe
IT Jolliffe
M Vichi
R Xu
S Vines
Z Ma
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Clustering and Disjoint Principal Component Analysis (CDP CA) is a constrained principal component analysis recently proposed for clustering of objects and partitioning of variables, simultaneously, which we have implemented in R language. In this paper, we deal in detail with the alternating least-squares algorithm for CDPCA and highlight its algebraic features for constructing both interpretable principal components and clusters of objects. Two applications are given to illustrate the capabilities of this new methodology

Crossref

Repositório Institucional da Universidade de Aveiro

Classification of Ultrasonic Weld Inspection Data Using Principal Component Analysis

Author: AR Baker
IT Jolliffe
J Yang
S Haykin
SF Burch
TD Sanger
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1997
Field of study

Recent inservice inspection experience, round robin tests of ultrasonic inspection reliability [1] and calculations of flaw detection reliability necessary for specific nuclear power plant applications have consistently shown the need to improve the reliability of ultrasonic inspection. The need to improve ultrasonic inspection reliability is further emphasized when one reviews the pass rates for performance demonstrations specified by ASME Section XI Appendix VIII

Digital Repository @ Iowa State University (ISU)

Crossref

A model-based multithreshold method for subgroup identification

Author: Anderson TW
Breiman L
Giri NC
Golub GH
Jolliffe IT
Loh WY
Messenger R
Paul D
Rao CR
Su X
Thomson GH
Publication venue: eScholarship, University of California
Publication date: 11/02/2019
Field of study

Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study

Crossref

eScholarship - University of California

Mean squared error vs. frame potential for unsupervised variable selection

Author: A d’Aspremont
A Vergara
G Nemhauser
GP McCabe
H Zou
I Rodriguez-Lujan
IT Jolliffe
J Ranieri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2017
Field of study

Queen's University Belfast Research Portal

Crossref

Damage and repair classification in reinforced concrete beams using frequency domain data

Author: A Webb
Ali A. Al-Ghalib
C Farrar
C Farrar
EP Carden
Fouad A. Mohammad
H Sohn
H Wenzel
IT Jolliffe
K Worden
NMM Maia
P Carden
S Doebling
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2015
Field of study

This research aims at developing a new vibration-based damage classification technique that can efficiently be applied to a real-time large data. Statistical pattern recognition paradigm is relevant to perform a reliable site-location damage diagnosis system. By adopting such paradigm, the finite element and other inverse models with their intensive computations, corrections and inherent inaccuracies can be avoided. In this research, a two-stage combination between principal component analysis and Karhunen-Loéve transformation (also known as canonical correlation analysis) was proposed as a statistical-based damage classification technique. Vibration measurements from frequency domain were tested as possible damage-sensitive features. The performance of the proposed system was tested and verified on real vibration measurements collected from five laboratory-scale reinforced concrete beams modelled with various ranges of defects. The results of the system helped in distinguishing between normal and damaged patterns in structural vibration data. Most importantly, the system further dissected reasonably each main damage group into subgroups according to their severity of damage. Its efficiency was conclusively proved on data from both frequency response functions and response-only functions. The outcomes of this two-stage system showed a realistic detection and classification and outperform results from the principal component analysis-only. The success of this classification model is substantially tenable because the observed clusters come from well-controlled and known state conditions

Crossref

Nottingham Trent Institutional Repository (IRep)

Determining Principal Component Cardinality through the Principle of Minimum Description Length

Author: A Blumer
AJ Donald
AP Dawid
AR Barron
C Eckart
DC Hoyle
IT Jolliffe
J Josse
J Rissanen
J Rissanen
J Rissanen
J Rissanen
JI Myung
M Mitzenmacher
M Zhu
MH Hansen
T Hastie
TM Cover
Y Choi
Publication venue
Publication date: 29/06/2019
Field of study

PCA (Principal Component Analysis) and its variants areubiquitous techniques for matrix dimension reduction and reduced-dimensionlatent-factor extraction. One significant challenge in using PCA, is thechoice of the number of principal components. The information-theoreticMDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically applyits modern definition - NML (Normalized Maximum Likelihood) - to theproblem of PCA. This work shows a general reduction of NML prob-lems to lower-dimension problems. Applying this reduction, it boundsthe NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201

arXiv.org e-Print Archive

Crossref

A comparison of unsupervised abnormality detection methods for interstitial lung disease

Author: A Depeursinge
B Schölkopf
B Schölkopf
F Pedregosa
IT Jolliffe
J Duchi
L Ertöz
L Sorensen
PJ Rousseeuw
T Kohonen
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/08/2018
Field of study

Heriot Watt Pure

Crossref

Efficient use of simultaneous multi-band observations for variable star analysis

Author: B Sesar
B Sesar
DH McNamara
DH McNamara
DM Bramich
IT Jolliffe
JA Frieman
JK Adelman-McCarthy
M Zechmeister
PJ Rousseeuw
T Hastie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/06/2011
Field of study

The luminosity changes of most types of variable stars are correlated in the different wavelengths, and these correlations may be exploited for several purposes: for variability detection, for distinction of microvariability from noise, for period search or for classification. Principal component analysis is a simple and well-developed statistical tool to analyze correlated data. We will discuss its use on variable objects of Stripe 82 of the Sloan Digital Sky Survey, with the aim of identifying new RR Lyrae and SX Phoenicis-type candidates. The application is not straightforward because of different noise levels in the different bands, the presence of outliers that can be confused with real extreme observations, under- or overestimated errors and the dependence of errors on the magnitudes. These particularities require robust methods to be applied together with the principal component analysis. The results show that PCA is a valuable aid in variability analysis with multi-band data.Comment: 8 pages, 5 figures, Workshop on Astrostatistics and Data Mining in Astronomical Databases, May 29-June 4 2011, La Palm

arXiv.org e-Print Archive

Crossref

The projection score - an evaluation criterion for variable subset selection in PCA visualization

Author: AA Shabalin
AE Raftery
C Boutsidis
C Haslinger
Charlotte Soneson
DA Jackson
DM Witten
DT Ross
E Bair
GP McCabe
H Hotelling
H Hotelling
H Shen
H Zou
H Zou
I Guyon
IM Johnstone
IM Johnstone
IT Jolliffe
IT Jolliffe
IT Jolliffe
K Hoffmann
K Pearson
M Lee
Magnus Fontes
ME Ross
MG Tadesse
O Modlich
PR Peres-Neto
R Tibshirani
R Varshavsky
S Bungaro
S Dray
SY Kassim
T Hastie
T Hastie
TR Golub
WJ Krzanowski
Y Liu
Y Lu
ZD Bai
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization. Results We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA. Conclusions We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.</p

Lund University Publications

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sparse Exploratory Factor Analysis

Author: A Edelman
C Hage
HH Harman
IT Jolliffe
J Choi
K Hirose
K Hirose
KG Jöreskog
Kohei Adachi
MATLAB
N Boumal
N Buono Del
Nickolay T. Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
P-A Absil
R Luss
SA Mulaik
Sara Fontanella
Z Wen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/07/2017
Field of study

Sparse principal component analysis is a very active research area in the last decade. It produces component loadings with many zero entries which facilitates their interpretation and helps avoid redundant variables. The classic factor analysis is another popular dimension reduction technique which shares similar interpretation problems and could greatly benefit from sparse solutions. Unfortunately, there are very few works considering sparse versions of the classic factor analysis. Our goal is to contribute further in this direction. We revisit the most popular procedures for exploratory factor analysis, maximum likelihood and least squares. Sparse factor loadings are obtained for them by, first, adopting a special reparameterization and, second, by introducing additional [Formula: see text]-norm penalties into the standard factor analysis problems. As a result, we propose sparse versions of the major factor analysis procedures. We illustrate the developed algorithms on well-known psychometric problems. Our sparse solutions are critically compared to ones obtained by other existing methods

Crossref

Open Research Online (The Open University)

Spiral - Imperial College Digital Repository