63 research outputs found
On the Procrustean analogue of individual differences scaling (INDSCAL)
In this paper, individual differences scaling (INDSCAL) is revisited, considering
INDSCAL as being embedded within a hierarchy of individual difference scaling
models. We explore the members of this family, distinguishing (i) models, (ii) the
role of identification and substantive constraints, (iii) criteria for fitting models and (iv) algorithms to optimise the criteria. Model formulations may be based either on data that are in the form of proximities or on configurational matrices. In its configurational version, individual difference scaling may be formulated as a form of generalized Procrustes analysis. Algorithms are introduced for fitting the new
models. An application from sensory evaluation illustrates the performance of the
methods and their solutions
Recommended from our members
Exploratory factor analysis of large data matrices
Nowadays, the most interesting applications have data with many more variables than observations and require dimension reduction. With such data, standard exploratory factor analysis (EFA) cannot be applied. Recently, a generalized EFA (GEFA) model was proposed to deal with any type of data: both vertical data(fewer variables than observations) and horizontal data (more variables than observations). The associated algorithm, GEFALS, is very efficient, but still cannot handle data with thousands of variables. The present work modifies GEFALS and proposes a new very fast version, GEFAN. This is achieved by aligning the dimensions of the parameter matrices to their ranks, thus, avoiding redundant calculations. The GEFALS and GEFAN algorithms are compared numerically with well-known data
Recommended from our members
Some inequalities contrasting principal component and factor analyses solutions
Principal component analysis (PCA) and factor analysis (FA) are two time-honored dimension reduction methods. In this paper, some inequalities are presented to contrast PCA and FA solutions for the same data set. For this reason, we take advantage of the recently established matrix decomposition (MD) formulation of FA. In summary, the resulting inequalities show that [1] FA gives a better fit to the data than PCA, [2] PCA extracts a larger amount of common “information” than FA, and [3] For each variable, its unique variance in FA is larger than its residual variance in PCA minus the one in FA. The resulting inequalities can be useful to suggest whether PCA or FA should be used for a particular data set. The answers can also be valid for the classic FA formulation not relying on the MD-FA definition, as both “types” FA provide almost equal solutions. Additionally, the inequalities give theoretical explanation of some empirically observed tendencies in PCA and FA solutions, e.g., that the absolute values of PCA loadings tend to be larger than those for FA loadings, and that the unique variances in FA tend to be larger than the residual variances of PCA
Recommended from our members
Sparse exploratory factor analysis
Sparse principal component analysis is a very active research area in the last decade. In the same time, there are very few works on sparse factor analysis. We propose a new contribution to the area by exploring a procedure for sparse factor analysis where the unknown parameters are found simultaneously
An improved estimation procedure for robust PARAFAC model fitting
Different techniques exist to analyze multi-way data but PARAFAC is one of the most popular. The usual way of parameter estimation in PARAFAC is an alternating least squares (ALS) procedure which yields least-squares solutions and provides consistent outcomes. Together with these desirable features, the ALS procedure suffers several major flaws which might be particularly problematic for large-scale problems: slow convergence and sensitiveness to degeneracy conditions such as over-factoring, collinearity, bad initialization and local minima. Furthermore, it is well-known that algorithms which rely on least squares easily break down in the presence of outliers. The issue of nonrobustness of the ALS procedure was addressed by [1] and software for computing the robust PARAFAC model is available in the R package rrcov3way (see [4]). The other issues were addressed in a number of works proposing algorithms more efficient than ALS. However, often these do not provide stable results because the increased speed might come at the expense of accuracy. An integrated algorithm was proposed by [2] and [3] which seems to combine improved speed and stability. The purpose of this work is to elaborate this algorithm by adding capabilities for dealing with outliers which might be present in the data. The idea of a robust version of ALS is to identify enough ”good“ observations and to perform the classical ALS on these observations. This is repeated until no significant change is observed. Finally, a reweighting step is carried out to improve the efficiency of the estimators. In order to identify the ”good” observations a robust version of principal component analysis on the unfolded array is used. It is obvious that the robust procedure will be much more time consuming than the classical one, repeating many times the ALS optimization. Therefore, any improvement of the performance of the parameter estimation procedure will contribute to the improvement of the performance of the complete robust procedure. We combine the robust PARAFAC procedure proposed by [1] with the highly efficient estimation algorithm INT2 proposed by [2] in order to obtain a fast estimation and robust to outliers PARAFAC modeling technique which we call R-INT2. As before it starts by robust principal components to identify any outlying points and then iterates using the INT2 algorithm until no significant change is observed. After convergence a reweighting step with INT2 is conducted which produces the final solution. The performance of the newly proposed algorithm R-INT2 for robust estimation of trilinear PARAFAC models is demonstrated in a brief simulation study comparing classical ALS, the robust version based on ALS ([1]) and the new R-INT2. First of, all we want to verify that R-INT2 works well on data sets with and without contamination by identifying the outliers at least as good as the robust ALS algorithm retrieving solutions with good statistical quality. At the same time we want to verify that the convergence is improved significantly and thus the computational time is reduced. Future work should bring a thorough investigation of the properties of the algorithm, comparison to the many existing fast alternatives and studying the possibilities for combination with other computational algorithms
Recommended from our members
Simple Structure Detection Through Bayesian Exploratory Multidimensional IRT Models
In modern validity theory, a major concern is the construct validity of a test, which is commonly assessed through confirmatory or exploratory factor analysis. In the framework of Bayesian exploratory Multidimensional Item Response Theory (MIRT) models, we discuss two methods aimed at investigating the underlying structure of a test, in order to verify if the latent model adheres to a chosen simple factorial structure. This purpose is achieved without imposing hard constraints on the discrimination parameter matrix to address the rotational indeterminacy. The first approach prescribes a 2-step procedure. The parameter estimates are obtained through an unconstrained MCMC sampler. The simple structure is, then, inspected with a post-processing step based on the Consensus Simple Target Rotation technique. In the second approach, both rotational invariance and simple structure retrieval are addressed within the MCMC sampling scheme, by introducing a sparsity-inducing prior on the discrimination parameters. Through simulation as well as real-world studies, we demonstrate that the proposed methods are able to correctly infer the underlying sparse structure and to retrieve interpretable solutions
Sparse Exploratory Factor Analysis
Sparse principal component analysis is a very active research area in the last decade. It produces component loadings with many zero entries which facilitates their interpretation and helps avoid redundant variables. The classic factor analysis is another popular dimension reduction technique which shares similar interpretation problems and could greatly benefit from sparse solutions. Unfortunately, there are very few works considering sparse versions of the classic factor analysis. Our goal is to contribute further in this direction. We revisit the most popular procedures for exploratory factor analysis, maximum likelihood and least squares. Sparse factor loadings are obtained for them by, first, adopting a special reparameterization and, second, by introducing additional [Formula: see text]-norm penalties into the standard factor analysis problems. As a result, we propose sparse versions of the major factor analysis procedures. We illustrate the developed algorithms on well-known psychometric problems. Our sparse solutions are critically compared to ones obtained by other existing methods
Sparsest factor analysis for clustering variables: a matrix decomposition approach
We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA
Semi-sparse PCA
It is well-known that the classical exploratory factor analysis (EFA) of data with more observations than variables has several types of indeterminacy. We study the factor indeterminacy and show some new aspects of this problem by considering EFA as a specific data matrix decomposition. We adopt a new approach to the EFA estimation and achieve a new characterization of the factor indeterminacy problem. A new alternative model is proposed, which gives determinate factors and can be seen as a semi-sparse principal component analysis (PCA). An alternating algorithm is developed, where in each step a Procrustes problem is solved. It is demonstrated that the new model/algorithm can act as a specific sparse PCA and as a low-rank-plus-sparse matrix decomposition. Numerical examples with several large data sets illustrate the versatility of the new model, and the performance and behaviour of its algorithmic implementation
- …