2,171 research outputs found
QBDT, a new boosting decision tree method with systematic uncertainties into training for High Energy Physics
A new boosting decision tree (BDT) method, QBDT, is proposed for the
classification problem in the field of high energy physics (HEP). In many HEP
researches, great efforts are made to increase the signal significance with the
presence of huge background and various systematical uncertainties. Why not
develop a BDT method targeting the significance directly? Indeed, the
significance plays a central role in this new method. It is used to split a
node in building a tree and to be also the weight contributing to the BDT
score. As the systematical uncertainties can be easily included in the
significance calculation, this method is able to learn about reducing the
effect of the systematical uncertainties via training. Taking the search of the
rare radiative Higgs decay in proton-proton collisions as example, QBDT and the popular Gradient BDT (GradBDT)
method are compared. QBDT is found to reduce the correlation between the signal
strength and systematical uncertainty sources and thus to give a better
significance. The contribution to the signal strength uncertainty from the
systematical uncertainty sources using the new method is 50-85~\% of that using
the GradBDT method.Comment: 29 pages, accepted for publication in NIMA, algorithm available at
https://github.com/xialigang/QBD
Improved Asymptotic Formulae for Statistical Interpretation Based on Likelihood Ratio Tests
In this work, we improve the asymptotic formulae to describe the probability
distribution of a test statistic in G. Cowan \emph{et al.}'s
paper~\cite{asimov} from a perspective totally different from last version of
this arXiv entry. The starting point of this version seems more natural. The
probability distribution function under the hypothesis is
\.
Here is Poisson distribution function; is the
boarder between large statistics (LS) and small statistics (SS), and has to be
chosen appropriately. If the number of events is greater than
, the probability distribution of is described by a
single function . is basically the classic
asymptotic formulae with a correction. For each possible number of events not
greater than , we obtain the probability distribution,
, based on a simplifed 6-bin distribution of the observables.
In this way, the bump structures due to small sample size can be well
predicted. The new asymptotic formulae provide a much better differential
description of the test statistics.Comment: 13 pages, 7 figures, a different perspective, able to describe the
discrete feature in small-statistics case
- …