47,714 research outputs found
Do unbalanced data have a negative effect on LDA?
For two-class discrimination, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562] claimed that, when covariance matrices of the two classes were unequal, a (class) unbalanced data set had a negative effect on the performance of linear discriminant analysis (LDA). Through re-balancing 10 real-world data sets, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562] provided empirical evidence to support the claim using AUC (Area Under the receiver operating characteristic Curve) as the performance metric. We suggest that such a claim is vague if not misleading, there is no solid theoretical analysis presented in Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557–562], and AUC can lead to a quite different conclusion from that led to by misclassification error rate (ER) on the discrimination performance of LDA for unbalanced data sets. Our empirical and simulation studies suggest that, for LDA, the increase of the median of AUC (and thus the improvement of performance of LDA) from re-balancing is relatively small, while, in contrast, the increase of the median of ER (and thus the decline in performance of LDA) from re-balancing is relatively large. Therefore, from our study, there is no reliable empirical evidence to support the claim that a (class) unbalanced data set has a negative effect on the performance of LDA. In addition, re-balancing affects the performance of LDA for data sets with either equal or unequal covariance matrices, indicating that having unequal covariance matrices is not a key reason for the difference in performance between original and re-balanced data
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
Convertible bond announcement effects: why is Japan different?
U.S. and Japanese firms dominate global convertible bond issuance. Previous research documents more favorable convertible bond announcement effects in Japan than in the U.S. and other developed countries. Using a global sample of convertible bonds issued from 1982 to 2012, we find that the more favorable announcement effects of Japanese convertibles are driven by their stated uses of proceeds. Japanese convertibles more often include capital expenditure as an intended use, while U.S. firms tend to mention general purposes to motivate their offering. Our findings illustrate the value to firms of being more explicit when disclosing the intended use of proceeds of security offerings
A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies
For in vivo research experiments with small sample sizes and available
historical data, we propose a sequential Bayesian method for the Behrens-Fisher
problem. We consider it as a model choice question with two models in
competition: one for which the two expectations are equal and one for which
they are different. The choice between the two models is performed through a
Bayesian analysis, based on a robust choice of combined objective and
subjective priors, set on the parameters space and on the models space. Three
steps are necessary to evaluate the posterior probability of each model using
two historical datasets similar to the one of interest. Starting from the
Jeffreys prior, a posterior using a first historical dataset is deduced and
allows to calibrate the Normal-Gamma informative priors for the second
historical dataset analysis, in addition to a uniform prior on the model space.
From this second step, a new posterior on the parameter space and the models
space can be used as the objective informative prior for the last Bayesian
analysis. Bayesian and frequentist methods have been compared on simulated and
real data. In accordance with FDA recommendations, control of type I and type
II error rates has been evaluated. The proposed method controls them even if
the historical experiments are not completely similar to the one of interest
- …