868 research outputs found
The alternating least-squares algorithm for CDPCA
Clustering and Disjoint Principal Component Analysis (CDP CA) is a constrained principal component analysis recently proposed for clustering of objects and partitioning of variables, simultaneously, which we have implemented in R language. In this paper, we deal in detail with the alternating least-squares algorithm for CDPCA and highlight its algebraic features for constructing both interpretable principal components and clusters of objects. Two applications are given to illustrate the capabilities of this new methodology
Classification of Ultrasonic Weld Inspection Data Using Principal Component Analysis
Recent inservice inspection experience, round robin tests of ultrasonic inspection reliability [1] and calculations of flaw detection reliability necessary for specific nuclear power plant applications have consistently shown the need to improve the reliability of ultrasonic inspection. The need to improve ultrasonic inspection reliability is further emphasized when one reviews the pass rates for performance demonstrations specified by ASME Section XI Appendix VIII
A model-based multithreshold method for subgroup identification
Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study
Damage and repair classification in reinforced concrete beams using frequency domain data
This research aims at developing a new vibration-based damage classification technique that can efficiently be applied to a real-time large data. Statistical pattern recognition paradigm is relevant to perform a reliable site-location damage diagnosis system. By adopting such paradigm, the finite element and other inverse models with their intensive computations, corrections and inherent inaccuracies can be avoided. In this research, a two-stage combination between principal component analysis and Karhunen-Loéve transformation (also known as canonical correlation analysis) was proposed as a statistical-based damage classification technique. Vibration measurements from frequency domain were tested as possible damage-sensitive features. The performance of the proposed system was tested and verified on real vibration measurements collected from five laboratory-scale reinforced concrete beams modelled with various ranges of defects. The results of the system helped in distinguishing between normal and damaged patterns in structural vibration data. Most importantly, the system further dissected reasonably each main damage group into subgroups according to their severity of damage. Its efficiency was conclusively proved on data from both frequency response functions and response-only functions. The outcomes of this two-stage system showed a realistic detection and classification and outperform results from the principal component analysis-only. The success of this classification model is substantially tenable because the observed clusters come from well-controlled and known state conditions
Determining Principal Component Cardinality through the Principle of Minimum Description Length
PCA (Principal Component Analysis) and its variants areubiquitous techniques
for matrix dimension reduction and reduced-dimensionlatent-factor extraction.
One significant challenge in using PCA, is thechoice of the number of principal
components. The information-theoreticMDL (Minimum Description Length) principle
gives objective compression-based criteria for model selection, but it is
difficult to analytically applyits modern definition - NML (Normalized Maximum
Likelihood) - to theproblem of PCA. This work shows a general reduction of NML
prob-lems to lower-dimension problems. Applying this reduction, it boundsthe
NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201
Efficient use of simultaneous multi-band observations for variable star analysis
The luminosity changes of most types of variable stars are correlated in the
different wavelengths, and these correlations may be exploited for several
purposes: for variability detection, for distinction of microvariability from
noise, for period search or for classification. Principal component analysis is
a simple and well-developed statistical tool to analyze correlated data. We
will discuss its use on variable objects of Stripe 82 of the Sloan Digital Sky
Survey, with the aim of identifying new RR Lyrae and SX Phoenicis-type
candidates. The application is not straightforward because of different noise
levels in the different bands, the presence of outliers that can be confused
with real extreme observations, under- or overestimated errors and the
dependence of errors on the magnitudes. These particularities require robust
methods to be applied together with the principal component analysis. The
results show that PCA is a valuable aid in variability analysis with multi-band
data.Comment: 8 pages, 5 figures, Workshop on Astrostatistics and Data Mining in
Astronomical Databases, May 29-June 4 2011, La Palm
The projection score - an evaluation criterion for variable subset selection in PCA visualization
<p>Abstract</p> <p>Background</p> <p>In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization.</p> <p>Results</p> <p>We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA.</p> <p>Conclusions</p> <p>We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.</p
Sparse Exploratory Factor Analysis
Sparse principal component analysis is a very active research area in the last decade. It produces component loadings with many zero entries which facilitates their interpretation and helps avoid redundant variables. The classic factor analysis is another popular dimension reduction technique which shares similar interpretation problems and could greatly benefit from sparse solutions. Unfortunately, there are very few works considering sparse versions of the classic factor analysis. Our goal is to contribute further in this direction. We revisit the most popular procedures for exploratory factor analysis, maximum likelihood and least squares. Sparse factor loadings are obtained for them by, first, adopting a special reparameterization and, second, by introducing additional [Formula: see text]-norm penalties into the standard factor analysis problems. As a result, we propose sparse versions of the major factor analysis procedures. We illustrate the developed algorithms on well-known psychometric problems. Our sparse solutions are critically compared to ones obtained by other existing methods
- …