14,540 research outputs found

    ROBUST PRINCIPAL COMPONENT ANALYSIS

    Get PDF
    A common technique for robust dispersion estimators is to apply the classical estimator to some subset U of the data. Applying principal component analysis to the subset U can result in a robust principal component analysis with good properties

    Respiratory motion correction in dynamic MRI using robust data decomposition registration - Application to DCE-MRI.

    Get PDF
    Motion correction in Dynamic Contrast Enhanced (DCE-) MRI is challenging because rapid intensity changes can compromise common (intensity based) registration algorithms. In this study we introduce a novel registration technique based on robust principal component analysis (RPCA) to decompose a given time-series into a low rank and a sparse component. This allows robust separation of motion components that can be registered, from intensity variations that are left unchanged. This Robust Data Decomposition Registration (RDDR) is demonstrated on both simulated and a wide range of clinical data. Robustness to different types of motion and breathing choices during acquisition is demonstrated for a variety of imaged organs including liver, small bowel and prostate. The analysis of clinically relevant regions of interest showed both a decrease of error (15-62% reduction following registration) in tissue time-intensity curves and improved areas under the curve (AUC60) at early enhancement

    Integrating joint feature selection into subspace learning: A formulation of 2DPCA for outliers robust feature selection

    Full text link
    © 2019 Elsevier Ltd Since the principal component analysis and its variants are sensitive to outliers that affect their performance and applicability in real world, several variants have been proposed to improve the robustness. However, most of the existing methods are still sensitive to outliers and are unable to select useful features. To overcome the issue of sensitivity of PCA against outliers, in this paper, we introduce two-dimensional outliers-robust principal component analysis (ORPCA) by imposing the joint constraints on the objective function. ORPCA relaxes the orthogonal constraints and penalizes the regression coefficient, thus, it selects important features and ignores the same features that exist in other principal components. It is commonly known that square Frobenius norm is sensitive to outliers. To overcome this issue, we have devised an alternative way to derive objective function. Experimental results on four publicly available benchmark datasets show the effectiveness of joint feature selection and provide better performance as compared to state-of-the-art dimensionality-reduction methods

    Search for high-amplitude δ Scuti and RR Lyrae stars in Sloan Digital Sky Survey Stripe 82 using principal component analysis

    Get PDF
    We propose a robust principal component analysis framework for the exploitation of multiband photometric measurements in large surveys. Period search results are improved using the time-series of the first principal component due to its optimized signal-to-noise ratio. The presence of correlated excess variations in the multivariate time-series enables the detection of weaker variability. Furthermore, the direction of the largest variance differs for certain types of variable stars. This can be used as an efficient attribute for classification. The application of the method to a subsample of Sloan Digital Sky Survey Stripe 82 data yielded 132 high-amplitude δ Scuti variables. We also found 129 new RR Lyrae variables, complementary to the catalogue of Sesar et al., extending the halo area mapped by Stripe 82 RR Lyrae stars towards the Galactic bulge. The sample also comprises 25 multiperiodic or Blazhko RR Lyrae star

    Sparse dimensionality reduction approaches in Mendelian randomization with highly correlated exposures.

    Get PDF
    Multivariable Mendelian randomization (MVMR) is an instrumental variable technique that generalizes the MR framework for multiple exposures. Framed as a linear regression problem, it is subject to the pitfall of multi-collinearity. The bias and efficiency of MVMR estimates thus depends heavily on the correlation of exposures. Dimensionality reduction techniques such as principal component analysis (PCA) provide transformations of all the included variables that are effectively uncorrelated. We propose the use of sparse PCA (sPCA) algorithms that create principal components of subsets of the exposures with the aim of providing more interpretable and reliable MR estimates. The approach consists of three steps. We first apply a sparse dimension reduction method and transform the variant-exposure summary statistics to principal components. We then choose a subset of the principal components based on data-driven cutoffs, and estimate their strength as instruments with an adjusted F-statistic. Finally, we perform MR with these transformed exposures. This pipeline is demonstrated in a simulation study of highly correlated exposures and an applied example using summary data from a genome-wide association study of 97 highly correlated lipid metabolites. As a positive control, we tested the causal associations of the transformed exposures on CHD. Compared to the conventional inverse-variance weighted MVMR method and a weak-instrument robust MVMR method (MR GRAPPLE), sparse component analysis achieved a superior balance of sparsity and biologically insightful grouping of the lipid traits

    The measurement of household socio-economic position in tuberculosis prevalence surveys: a sensitivity analysis.

    No full text
    OBJECTIVE: To assess the robustness of socio-economic inequalities in tuberculosis (TB) prevalence surveys. DESIGN: Data were drawn from the TB prevalence survey conducted in Lusaka Province, Zambia, in 2005-2006. We compared TB socio-economic inequalities measured through an asset-based index (Index 0) using principal component analysis (PCA) with those observed using three alternative indices: Index 1 and Index 2 accounted respectively for the biases resulting from the inclusion of urban assets and food-related variables in Index 0. Index 3 was built using regression-based analysis instead of PCA to account for the effect of using a different assets weighting strategy. RESULTS: Household socio-economic position (SEP) was significantly associated with prevalent TB, regardless of the index used; however, the magnitude of inequalities did vary across indices. A strong association was found for Index 2, suggesting that the exclusion of food-related variables did not reduce the extent of association between SEP and prevalent TB. The weakest association was found for Index 1, indicating that the exclusion of urban assets did not lead to higher extent of TB inequalities. CONCLUSION: TB socio-economic inequalities seem to be robust to the choice of SEP indicator. The epidemiological meaning of the different extent of TB inequalities is unclear. Further studies are needed to confirm our conclusions

    Machine Learning Developments in Dependency Modelling and Feature Extraction

    Get PDF
    Three complementary feature extraction approaches are developed in this thesis which addresses the challenge of dimensionality reduction in the presence of multivariate heavy-tailed and asymmetric distributions. First, we demonstrate how to improve the robustness of the standard Probabilistic Principal Component Analysis by adapting the concept of robust mean and covariance estimation within the standard framework. We then introduce feature extraction methods that extend the standard Principal Component Analysis by exploring distribution-based robustification. This is achieved via Probabilistic Principal Component Analysis (PPCA), in which new, statistically robust variants are derived, also treating missing data. We propose a novel generalisation to the t-Student Probabilistic Principal Component methodology which (1) accounts for asymmetric distribution of the observation data, (2) is a framework for grouped and generalised multiple-degree-of-freedom structures, which provides a more flexible framework to model groups of marginal tail dependence in the observation data, and (3) separates the tail effect of the error terms and factors. The new feature extraction methods are derived in an incomplete data setting to efficiently handle the presence of missing values in the observation vector. We discuss statistical properties of their robustness. In the next part of this thesis, we demonstrate the applicability of feature extraction methods to the statistical analysis of multidimensional dynamics. We introduce the class of Hybrid Factor models that combines classical state-space model formulations with incorporation of exogenous factors. We show how to utilize the information obtained from features extracted using introduced robust PPCA in a modelling framework in a meaningful and parsimonious manner. In the first application study, we show the applicability of robust feature extraction methods in the real data environment of financial markets and combine the obtained results with a stochastic multi-factor panel regression-based state-space model in order to model the dynamic of yield curves, whilst incorporating regression factors. We embed the rank-reduced feature extractions into a stochastic representation of state-space models for yield curve dynamics and compare the results to classical multi-factor dynamic Nelson-Siegel state-space models. This leads to important new representations of yield curve models that can have practical importance for addressing questions of financial stress testing and monetary policy interventions which can efficiently incorporate financial big data. We illustrate our results on various financial and macroeconomic data sets from the Euro Zone and international markets. In the second study, we develop a multi-factor extension of the family of Lee-Carter stochastic mortality models. We build upon the time, period and cohort stochastic model structure to include exogenous observable demographic features that can be used as additional factors to improve model fit and forecasting accuracy. We develop a framework in which (a) we employ projection-based techniques of dimensionality reduction that are amenable to different structures of demographic data; (b) we analyse demographic data sets from the patterns of missingness and the impact of such missingness on the feature extraction; (c) we introduce a class of multi-factor stochastic mortality models incorporating time, period, cohort and demographic features, which we develop within a Bayesian state-space estimation framework. Finally (d) we develop an efficient combined Markov chain and filtering framework for sampling the posterior and forecasting. We undertake a detailed case study on the Human Mortality Database demographic data from European countries and we use the extracted features to better explain the term structure of mortality in the UK over time for male and female populations. This is compared to a pure Lee-Carter stochastic mortality model, demonstrating that our feature extraction framework and consequent multi-factor mortality model improves both in-sample fit and, importantly, out-of-sample mortality forecasts by a non-trivial gain in performance

    Relationships of overall estery aroma character in lagers with volatile headspace congener concentrations

    No full text
    In lager beers the intensity of “estery” aroma character is re-garded as an important component of sensory quality, but its origins are somewhat uncertain. Overall “estery” aroma intensity was predicted from capillary gas chromatographic (GC) data following solid phase micro extraction (SPME) of headspaces. Estery character was scored in 23 commercial lagers using rank-rating, allowing assessors (13) constant access to a range of appropriate standards. From univariate data analysis, all asses-sors behaved similarly and lagers fell into three significantly different groups: low (1), high (1) and intermediate (21). The quantification of 36 flavour volatiles by SPME of headspaces was reproducible and principal component analysis explained 91% total variance. Multiple linear regression could utilise only a restricted (26) set of flavour volatiles, whereas partial least square regression, that considered all flavour components, showed significant differences and improved prediction. How-ever, an artificial neural network that could compensate for non-linearities and interactions in ester perception gave the most robust prediction at R2 = 0.88

    Automobile indexation from 3D point clouds of urban scenarios

    Get PDF
    In this paper, we introduce a methodology for the detection and segmentation of automobiles in urban scenarios. We use the LiDAR Velodyne HDL-64E to scan the surroundings. The method is comprised of three steps: (1) remove facades, ground plan, and unstructured objects, (2) smoothing data using robust principal component analysis (RPCA), and finally, (3) unstructured objects model and indexing. The dataset is partitioned into training with 4500 objects and test with 3000 objects. Mean Shift thresholds, the filter, the Delaunay parameters, and the histogram modelling are optimized via ROC analysis. It is observed that the car scan quality affects our method to a lesser degree when compared with state-of-the-art methods
    • …
    corecore