1,024 research outputs found
Exact Dimensionality Selection for Bayesian PCA
We present a Bayesian model selection approach to estimate the intrinsic
dimensionality of a high-dimensional dataset. To this end, we introduce a novel
formulation of the probabilisitic principal component analysis model based on a
normal-gamma prior distribution. In this context, we exhibit a closed-form
expression of the marginal likelihood which allows to infer an optimal number
of components. We also propose a heuristic based on the expected shape of the
marginal likelihood curve in order to choose the hyperparameters. In
non-asymptotic frameworks, we show on simulated data that this exact
dimensionality selection approach is competitive with both Bayesian and
frequentist state-of-the-art methods
Determining Principal Component Cardinality through the Principle of Minimum Description Length
PCA (Principal Component Analysis) and its variants areubiquitous techniques
for matrix dimension reduction and reduced-dimensionlatent-factor extraction.
One significant challenge in using PCA, is thechoice of the number of principal
components. The information-theoreticMDL (Minimum Description Length) principle
gives objective compression-based criteria for model selection, but it is
difficult to analytically applyits modern definition - NML (Normalized Maximum
Likelihood) - to theproblem of PCA. This work shows a general reduction of NML
prob-lems to lower-dimension problems. Applying this reduction, it boundsthe
NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201
Principal Component Analysis Applied to Surface Electromyography: A Comprehensive Review
© 2016 IEEE. Surface electromyography (sEMG) records muscle activities from the surface of muscles, which offers a wealth of information concerning muscle activation patterns in both research and clinical settings. A key principle underlying sEMG analyses is the decomposition of the signal into a number of motor unit action potentials (MUAPs) that capture most of the relevant features embedded in a low-dimensional space. Toward this, the principal component analysis (PCA) has extensively been sought after, whereby the original sEMG data are translated into low-dimensional MUAP components with a reduced level of redundancy. The objective of this paper is to disseminate the role of PCA in conjunction with the quantitative sEMG analyses. Following the preliminaries on the sEMG methodology and a statement of PCA algorithm, an exhaustive collection of PCA applications related to sEMG data is in order. Alongside the technical challenges associated with the PCA-based sEMG processing, the envisaged research trend is also discussed
Detecting outlier samples in microarray data
In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data. We propose an outlier detection method based on principal component analysis (PCA) and robust estimation of Mahalanobis distances that is fully automatic. We demonstrate that our outlier detection method identifies biologically significant outliers with high accuracy and that outlier removal improves the prediction accuracy of classifiers. Our outlier detection method is closely related to existing robust PCA methods, so we compare our outlier detection method to a prominent robust PCA method. Copyright ©2009 The Berkeley Electronic Press. All rights reserved.published_or_final_versio
Knowledge Extraction Using Probabilistic Reasoning: An Artificial Neural Network Approach
The World Wide Web (WWW) has radically changed the way in which we access, generate and disseminate information. Its presence is felt daily and with more internet-enabled devices being connected the web of knowledge is growing. We are now moving into era where the WWW is capable of ‘understanding’ the actual/intended meaning of our content. This is being achieved by creating links between distributed data sources using the Resource Description Framework (RDF). In order to find information in this web of interconnected sources, complex query languages are often employed, e.g. SPARQL. However, this approach is limited as exact query matches are often required. In order to overcome this challenge, this paper presents a probabilistic approach to searching RDF documents. The developed algorithm converts RDF data into a matrix of features and treats searching as a machine learning problem. Using a number of artificial neural network algorithms, a successfully developed prototype has been developed that demonstrates the applicability of the approach. The results illustrate that the Voted Perceptron classifier (VPC), perceptron linear classifier (PERLC) and random neural network classifier (RNNC) performed particularly well, with accuracies of 100%, 98% and 93% respectively
- …