216 research outputs found
How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics
Multivariate analyses play an important role in high energy physics. Such
analyses often involve performing an unbinned maximum likelihood fit of a
probability density function (p.d.f.) to the data. This paper explores a
variety of unbinned methods for determining the goodness of fit of the p.d.f.
to the data. The application and performance of each method is discussed in the
context of a real-life high energy physics analysis (a Dalitz-plot analysis).
Several of the methods presented in this paper can also be used for the
non-parametric determination of whether two samples originate from the same
parent p.d.f. This can be used, e.g., to determine the quality of a detector
Monte Carlo simulation without the need for a parametric expression of the
efficiency.Comment: 32 pages, 12 figure
Markov basis and Groebner basis of Segre-Veronese configuration for testing independence in group-wise selections
We consider testing independence in group-wise selections with some
restrictions on combinations of choices. We present models for frequency data
of selections for which it is easy to perform conditional tests by Markov chain
Monte Carlo (MCMC) methods. When the restrictions on the combinations can be
described in terms of a Segre-Veronese configuration, an explicit form of a
Gr\"obner basis consisting of moves of degree two is readily available for
performing a Markov chain. We illustrate our setting with the National Center
Test for university entrance examinations in Japan. We also apply our method to
testing independence hypotheses involving genotypes at more than one locus or
haplotypes of alleles on the same chromosome.Comment: 25 pages, 5 figure
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
Multi-objective optimisation for receiver operating characteristic analysis
Copyright © 2006 Springer-Verlag Berlin Heidelberg. The final publication is available at link.springer.comBook title: Multi-Objective Machine LearningSummary
Receiver operating characteristic (ROC) analysis is now a standard tool for the comparison of binary classifiers and the selection operating parameters when the costs of misclassification are unknown.
This chapter outlines the use of evolutionary multi-objective optimisation techniques for ROC analysis, in both its traditional binary classification setting, and in the novel multi-class ROC situation.
Methods for comparing classifier performance in the multi-class case, based on an analogue of the Gini coefficient, are described, which leads to a natural method of selecting the classifier operating point. Illustrations are given concerning synthetic data and an application to Short Term Conflict Alert
- …