31,986 research outputs found

    Cauchy robust principal component analysis with applications to high-deimensional data sets

    Full text link
    Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood outperform or are on par with existing robust PCA techniques

    Toroidal PCA via density ridges

    Full text link
    Principal Component Analysis (PCA) is a well-known linear dimension-reduction technique designed for Euclidean data. In a wide spectrum of applied fields, however, it is common to observe multivariate circular data (also known as toroidal data), rendering spurious the use of PCA on it due to the periodicity of its support. This paper introduces Toroidal Ridge PCA (TR-PCA), a novel construction of PCA for bivariate circular data that leverages the concept of density ridges as a flexible first principal component analog. Two reference bivariate circular distributions, the bivariate sine von Mises and the bivariate wrapped Cauchy, are employed as the parametric distributional basis of TR-PCA. Efficient algorithms are presented to compute density ridges for these two distribution models. A complete PCA methodology adapted to toroidal data (including scores, variance decomposition, and resolution of edge cases) is introduced and implemented in the companion R package ridgetorus. The usefulness of TR-PCA is showcased with a novel case study involving the analysis of ocean currents on the coast of Santa Barbara.Comment: 20 pages, 8 figures, 1 tabl

    Detecting and handling outlying trajectories in irregularly sampled functional datasets

    Full text link
    Outlying curves often occur in functional or longitudinal datasets, and can be very influential on parameter estimators and very hard to detect visually. In this article we introduce estimators of the mean and the principal components that are resistant to, and then can be used for detection of, outlying sample trajectories. The estimators are based on reduced-rank t-models and are specifically aimed at sparse and irregularly sampled functional data. The outlier-resistance properties of the estimators and their relative efficiency for noncontaminated data are studied theoretically and by simulation. Applications to the analysis of Internet traffic data and glycated hemoglobin levels in diabetic children are presented.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS257 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Invalidity of the Laplace Law for Biological Vessels and of Estimating Elastic Modulus from Total Stress vs. Strain: a New Practical Method

    Full text link
    The quantification of the stiffness of tubular biological structures is often obtained, both in vivo and in vitro, as the slope of total transmural hoop stress plotted against hoop strain. Total hoop stress is typically estimated using the "Laplace law." We show that this procedure is fundamentally flawed for two reasons: Firstly, the Laplace law predicts total stress incorrectly for biological vessels. Furthermore, because muscle and other biological tissue are closely volume-preserving, quantifications of elastic modulus require the removal of the contribution to total stress from incompressibility. We show that this hydrostatic contribution to total stress has a strong material-dependent nonlinear response to deformation that is difficult to predict or measure. To address this difficulty, we propose a new practical method to estimate a mechanically viable modulus of elasticity that can be applied both in vivo and in vitro using the same measurements as current methods, with care taken to record the reference state. To be insensitive to incompressibility, our method is based on shear stress rather than hoop stress, and provides a true measure of the elastic response without application of the Laplace law. We demonstrate the accuracy of our method using a mathematical model of tube inflation with multiple constitutive models. We also re-analyze an in vivo study from the gastro-intestinal literature that applied the standard approach and concluded that a drug-induced change in elastic modulus depended on the protocol used to distend the esophageal lumen. Our new method removes this protocol-dependent inconsistency in the previous result.Comment: 34 pages, 13 figure

    Fast and accurate con-eigenvalue algorithm for optimal rational approximations

    Full text link
    The need to compute small con-eigenvalues and the associated con-eigenvectors of positive-definite Cauchy matrices naturally arises when constructing rational approximations with a (near) optimally small LL^{\infty} error. Specifically, given a rational function with nn poles in the unit disk, a rational approximation with mnm\ll n poles in the unit disk may be obtained from the mmth con-eigenvector of an n×nn\times n Cauchy matrix, where the associated con-eigenvalue λm>0\lambda_{m}>0 gives the approximation error in the LL^{\infty} norm. Unfortunately, standard algorithms do not accurately compute small con-eigenvalues (and the associated con-eigenvectors) and, in particular, yield few or no correct digits for con-eigenvalues smaller than the machine roundoff. We develop a fast and accurate algorithm for computing con-eigenvalues and con-eigenvectors of positive-definite Cauchy matrices, yielding even the tiniest con-eigenvalues with high relative accuracy. The algorithm computes the mmth con-eigenvalue in O(m2n)\mathcal{O}(m^{2}n) operations and, since the con-eigenvalues of positive-definite Cauchy matrices decay exponentially fast, we obtain (near) optimal rational approximations in O(n(logδ1)2)\mathcal{O}(n(\log\delta^{-1})^{2}) operations, where δ\delta is the approximation error in the LL^{\infty} norm. We derive error bounds demonstrating high relative accuracy of the computed con-eigenvalues and the high accuracy of the unit con-eigenvectors. We also provide examples of using the algorithm to compute (near) optimal rational approximations of functions with singularities and sharp transitions, where approximation errors close to machine precision are obtained. Finally, we present numerical tests on random (complex-valued) Cauchy matrices to show that the algorithm computes all the con-eigenvalues and con-eigenvectors with nearly full precision
    corecore