835 research outputs found

    PCA model building with missing data: New proposals and a comparative study

    Full text link
    [EN] This paper introduces new methods for building principal component analysis (PCA) models with missing data: projection to the model plane (PMP), known data regression (KDR), KDR with principal component regression (PCR), KDR with partial least squares regression (PLS) and trimmed scores regression (TSR). These methods are adapted from their PCA model exploitation version to deal with the more general problem of PCA model building when the training set has missing values. A comparative study is carried out comparing these new methods with the standard ones, such as the modified nonlinear iterative partial least squares (NIPALS), the it- erative algorithm (IA), the data augmentation method (DA) and the nonlinear programming approach (NLP). The performance is assessed using the mean squared prediction error of the reconstructed matrix and the cosines between the actual principal components and the ones extracted by each method. Four data sets, two simulated and two real ones, with several percentages of missing data, are used to perform the comparison. Guardar / Salir Siguiente >Research in this study was partially supported by the Spanish Ministry of Science and Innovation and FEDER funds from the European Union through grant DPI2011-28112-C04-02, and the Spanish Ministry of Economy and Competitiveness through grant ECO2013-43353-R. The authors gratefully acknowledge Salvador Garcia-Munoz for providing the Phi toolbox (version 1.7) to perform the nonlinear programming approach (NLP) method.Folch-Fortuny, A.; Arteaga Moreno, FJ.; Ferrer Riquelme, AJ. (2015). PCA model building with missing data: New proposals and a comparative study. Chemometrics and Intelligent Laboratory Systems. 146:77-88. https://doi.org/10.1016/j.chemolab.2015.05.006S778814

    Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization

    Full text link
    Principal component analysis (PCA) is widely used for dimensionality reduction, with well-documented merits in various applications involving high-dimensional data, including computer vision, preference measurement, and bioinformatics. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify PCA against outliers. A least-trimmed squares estimator of a low-rank bilinear factor analysis model is shown closely related to that obtained from an 0\ell_0-(pseudo)norm-regularized criterion encouraging sparsity in a matrix explicitly modeling the outliers. This connection suggests robust PCA schemes based on convex relaxation, which lead naturally to a family of robust estimators encompassing Huber's optimal M-class as a special case. Outliers are identified by tuning a regularization parameter, which amounts to controlling sparsity of the outlier matrix along the whole robustification path of (group) least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its neat ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace robustly, as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin

    Estimation of Incident Photosynthetically Active Radiation From Moderate Resolution Imaging Spectrometer Data

    Get PDF
    Incident photosynthetically active radiation (PAR) is a key variable needed by almost all terrestrial ecosystem models. Unfortunately, the current incident PAR products estimated from remotely sensed data at spatial and temporal resolutions are not sufficient for carbon cycle modeling and various applications. In this study, the authors develop a new method based on the look-up table approach for estimating instantaneous incident PAR from the polar-orbiting Moderate Resolution Imaging Spectrometer (MODIS) data. Since the top-of-atmosphere (TOA) radiance depends on both surface reflectance and atmospheric properties that largely determine the incident PAR, our first step is to estimate surface reflectance. The approach assumes known aerosol properties for the observations with minimum blue reflectance from a temporal window of each pixel. Their inverted surface reflectance is then interpolated to determine the surface reflectance of other observations. The second step is to calculate PAR by matching the computed TOA reflectance from the look-up table with the TOA values of the satellite observations. Both the direct and diffuse PAR components, as well as the total shortwave radiation, are determined in exactly the same fashion. The calculation of a daily average PAR value from one or two instantaneous PAR values is also explored. Ground measurements from seven FLUXNET sites are used for validating the algorithm. The results indicate that this approach can produce reasonable PAR product at 1 km resolution and is suitable for global applications, although more quantitative validation activities are still needed

    Airborne gravity and precise positioning for geologic applications

    Get PDF
    Airborne gravimetry has become an important geophysical tool primarily because of advancements in methodology and instrumentation made in the past decade. Airborne gravity is especially useful when measured in conjunction with other geophysical data, such as magnetics, radar, and laser altimetry. The aerogeophysical survey over the West Antarctic ice sheet described in this paper is one such interdisciplinary study. This paper outlines in detail the instrumentation, survey and data processing methodology employed to perform airborne gravimetry from the multiinstrumented Twin Otter aircraft. Precise positioning from carrier-phase Global Positioning System (GPS) observations are combined with measurements of acceleration made by the gravity meter in the aircraft to obtain the free-air gravity anomaly measurement at aircraft altitude. GPS data are processed using the Kinematic and Rapid Static (KARS) software program, and aircraft vertical acceleration and corrections for gravity data reduction are calculated from the GPS position solution. Accuracies for the free-air anomaly are determined from crossover analysis after significant editing (2.98 mGal rms) and from a repeat track (1.39 mGal rms). The aerogeophysical survey covered a 300,000 km2 region in West Antarctica over the course of five field seasons. The gravity data from the West Antarctic survey reveal the major geologic structures of the West Antarctic rift system, including the Whitmore Mountains, the Byrd Subglacial Basin, the Sinuous Ridge, the Ross Embayment, and Siple Dome. These measurements, in conjunction with magnetics and ice-penetrating radar, provide the information required to reveal the tectonic fabric and history of this important region

    High-Dimensional Linear and Functional Analysis of Multivariate Grapevine Data

    Get PDF
    Variable selection plays a major role in multivariate high-dimensional statistical modeling. Hence, we need to select a consistent model, which avoids overfitting in prediction, enhances model interpretability and identifies relevant variables. We explore various continuous, nearly unbiased, sparse and accurate technique of linear model using coefficients paths like penalized maximum likelihood and nonconvex penalties, and iterative Sure Independence Screening (SIS). The convex penalized (pseudo-) likelihood approach based on the elastic net uses a mixture of the ℓ1 (Lasso) and ℓ2 (ridge regression) simultaneously achieve automatic variable selection, continuous shrinkage, and selection of the groups of correlated variables. Variable selection using coefficients paths for minimax concave penalty (MCP), starts applying penalization at the same rate as Lasso, and then smoothly relaxes the rate down to zero as the absolute value of the coefficient increases. The sure screening method is based on correlation learning, which computes component wise estimators using AIC for tuning the regularization parameter of the penalized likelihood Lasso. To reflect the eternal nature of spectral data, we use the Functional Data approach by approximating the finite linear combination of basis functions using B-splines. MCP, SIS and Functional regression are based on the intuition that the predictors are independent. However, high-dimensional grapevine dataset suffers from ill-conditioning of the covariance matrix due to multicollinearity. Under collinearity, the Elastic-Net Regularization path via Coordinate Descent yields the best result to control the sparsity of the model and cross-validation to reduce bias in variable selection. Iterative stepwise multiple linear regression reduces complexity and enhances the predictability of the model by selecting only significant predictors

    Vol. 15, No. 1 (Full Issue)

    Get PDF

    Robust and Regularized Algorithms for Vehicle Tractive Force Prediction and Mass Estimation

    Get PDF
    This work provides novel robust and regularized algorithms for parameter estimation with applications in vehicle tractive force prediction and mass estimation. Given a large record of real world data from test runs on public roads, recursive algorithms adjusted the unknown vehicle parameters under a broad variation of statistical assumptions for two linear gray-box models
    corecore