10,201 research outputs found
Probabilistic Fisher discriminant analysis: A robust and flexible alternative to Fisher discriminant analysis
International audienceFisher discriminant analysis (FDA) is a popular and powerful method for dimensionality reduction and classification. Unfortunately, the optimality of the dimension reduction provided by FDA is only proved in the homoscedastic case. In addition, FDA is known to have poor performances in the cases of label noise and sparse labeled data. To overcome these limitations, this work proposes a probabilistic framework for FDA which relaxes the homoscedastic assumption on the class covariance matrices and adds a term to explicitly model the non-discriminative information. This allows the proposed method to be robust to label noise and to be used in the semi-supervised context. Experiments on real-world datasets show that the proposed approach works at least as well as FDA in standard situations and outperforms it in the label noise and sparse label cases
Bankruptcy Prediction: A Comparison of Some Statistical and Machine Learning Techniques
We are interested in forecasting bankruptcies in a probabilistic way. Specifically, we compare the classification performance of several statistical and machine-learning techniques, namely discriminant analysis (Altman's Z-score), logistic regression, least-squares support vector machines and different instances of Gaussian processes (GP's) -that is GP's classifiers, Bayesian Fisher discriminant and Warped GP's. Our contribution to the field of computational finance is to introduce GP's as a potentially competitive probabilistic framework for bankruptcy prediction. Data from the repository of information of the US Federal Deposit Insurance Corporation is used to test the predictions.Bankruptcy prediction, Artificial intelligence, Supervised learning, Gaussian processes, Z-score.
Predicción de bancarrota: Una comparación de técnicas estadÃsticas y de aprendizaje supervisado para computadora
We are interested in forecasting bankruptcies in a probabilistic way. Specifcally, we com-
pare the classification performance of several statistical and machine-learning techniques,
namely discriminant analysis (Altman's Z-score), logistic regression, least-squares support
vector machines and different instances of Gaussian processes (GP's) -that is GP's classifiers,
Bayesian Fisher discriminant and Warped GP's. Our contribution to the field of computa-
tional finance is to introduce GP's as a potentially competitive probabilistic framework for
bankruptcy prediction. Data from the repository of information of the US Federal Deposit
Insurance Corporation is used to test the predictions
Predicción de bancarrota: Una comparación de técnicas estadÃsticas y de aprendizaje supervisado para computadora
We are interested in forecasting bankruptcies in a probabilistic way. Specifcally, we com-
pare the classification performance of several statistical and machine-learning techniques,
namely discriminant analysis (Altman's Z-score), logistic regression, least-squares support
vector machines and different instances of Gaussian processes (GP's) -that is GP's classifiers,
Bayesian Fisher discriminant and Warped GP's. Our contribution to the field of computa-
tional finance is to introduce GP's as a potentially competitive probabilistic framework for
bankruptcy prediction. Data from the repository of information of the US Federal Deposit
Insurance Corporation is used to test the predictions
Joint Bayesian Gaussian discriminant analysis for speaker verification
State-of-the-art i-vector based speaker verification relies on variants of
Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis. We
are mainly motivated by the recent work of the joint Bayesian (JB) method,
which is originally proposed for discriminant analysis in face verification. We
apply JB to speaker verification and make three contributions beyond the
original JB. 1) In contrast to the EM iterations with approximated statistics
in the original JB, the EM iterations with exact statistics are employed and
give better performance. 2) We propose to do simultaneous diagonalization (SD)
of the within-class and between-class covariance matrices to achieve efficient
testing, which has broader application scope than the SVD-based efficient
testing method in the original JB. 3) We scrutinize similarities and
differences between various Gaussian PLDAs and JB, complementing the previous
analysis of comparing JB only with Prince-Elder PLDA. Extensive experiments are
conducted on NIST SRE10 core condition 5, empirically validating the
superiority of JB with faster convergence rate and 9-13% EER reduction compared
with state-of-the-art PLDA.Comment: accepted by ICASSP201
Dimensionality reduction of clustered data sets
We present a novel probabilistic latent variable model to perform linear dimensionality reduction on data sets which contain clusters. We prove that the maximum likelihood solution of the model is an unsupervised generalisation of linear discriminant analysis. This provides a completely new approach to one of the most established and widely used classification algorithms. The performance of the model is then demonstrated on a number of real and artificial data sets
Probabilistic classification of acute myocardial infarction from multiple cardiac markers
Logistic regression and Gaussian mixture model (GMM) classifiers have been trained to estimate the probability of acute myocardial infarction (AMI) in patients based upon the concentrations of a panel of cardiac markers. The panel consists of two new markers, fatty acid binding protein (FABP) and glycogen phosphorylase BB (GPBB), in addition to the traditional cardiac troponin I (cTnI), creatine kinase MB (CKMB) and myoglobin. The effect of using principal component analysis (PCA) and Fisher discriminant analysis (FDA) to preprocess the marker concentrations was also investigated. The need for classifiers to give an accurate estimate of the probability of AMI is argued and three categories of performance measure are described, namely discriminatory ability, sharpness, and reliability. Numerical performance measures for each category are given and applied. The optimum classifier, based solely upon the samples take on admission, was the logistic regression classifier using FDA preprocessing. This gave an accuracy of 0.85 (95% confidence interval: 0.78–0.91) and a normalised Brier score of 0.89. When samples at both admission and a further time, 1–6 h later, were included, the performance increased significantly, showing that logistic regression classifiers can indeed use the information from the five cardiac markers to accurately and reliably estimate the probability AMI
- …