1,024 research outputs found

    Exact Dimensionality Selection for Bayesian PCA

    Get PDF
    We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods

    Determining Principal Component Cardinality through the Principle of Minimum Description Length

    Full text link
    PCA (Principal Component Analysis) and its variants areubiquitous techniques for matrix dimension reduction and reduced-dimensionlatent-factor extraction. One significant challenge in using PCA, is thechoice of the number of principal components. The information-theoreticMDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically applyits modern definition - NML (Normalized Maximum Likelihood) - to theproblem of PCA. This work shows a general reduction of NML prob-lems to lower-dimension problems. Applying this reduction, it boundsthe NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201

    Principal Component Analysis Applied to Surface Electromyography: A Comprehensive Review

    Get PDF
    © 2016 IEEE. Surface electromyography (sEMG) records muscle activities from the surface of muscles, which offers a wealth of information concerning muscle activation patterns in both research and clinical settings. A key principle underlying sEMG analyses is the decomposition of the signal into a number of motor unit action potentials (MUAPs) that capture most of the relevant features embedded in a low-dimensional space. Toward this, the principal component analysis (PCA) has extensively been sought after, whereby the original sEMG data are translated into low-dimensional MUAP components with a reduced level of redundancy. The objective of this paper is to disseminate the role of PCA in conjunction with the quantitative sEMG analyses. Following the preliminaries on the sEMG methodology and a statement of PCA algorithm, an exhaustive collection of PCA applications related to sEMG data is in order. Alongside the technical challenges associated with the PCA-based sEMG processing, the envisaged research trend is also discussed

    Detecting outlier samples in microarray data

    Get PDF
    In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data. We propose an outlier detection method based on principal component analysis (PCA) and robust estimation of Mahalanobis distances that is fully automatic. We demonstrate that our outlier detection method identifies biologically significant outliers with high accuracy and that outlier removal improves the prediction accuracy of classifiers. Our outlier detection method is closely related to existing robust PCA methods, so we compare our outlier detection method to a prominent robust PCA method. Copyright ©2009 The Berkeley Electronic Press. All rights reserved.published_or_final_versio

    Knowledge Extraction Using Probabilistic Reasoning: An Artificial Neural Network Approach

    Get PDF
    The World Wide Web (WWW) has radically changed the way in which we access, generate and disseminate information. Its presence is felt daily and with more internet-enabled devices being connected the web of knowledge is growing. We are now moving into era where the WWW is capable of ‘understanding’ the actual/intended meaning of our content. This is being achieved by creating links between distributed data sources using the Resource Description Framework (RDF). In order to find information in this web of interconnected sources, complex query languages are often employed, e.g. SPARQL. However, this approach is limited as exact query matches are often required. In order to overcome this challenge, this paper presents a probabilistic approach to searching RDF documents. The developed algorithm converts RDF data into a matrix of features and treats searching as a machine learning problem. Using a number of artificial neural network algorithms, a successfully developed prototype has been developed that demonstrates the applicability of the approach. The results illustrate that the Voted Perceptron classifier (VPC), perceptron linear classifier (PERLC) and random neural network classifier (RNNC) performed particularly well, with accuracies of 100%, 98% and 93% respectively
    corecore