2,171 research outputs found

    QBDT, a new boosting decision tree method with systematic uncertainties into training for High Energy Physics

    Full text link
    A new boosting decision tree (BDT) method, QBDT, is proposed for the classification problem in the field of high energy physics (HEP). In many HEP researches, great efforts are made to increase the signal significance with the presence of huge background and various systematical uncertainties. Why not develop a BDT method targeting the significance directly? Indeed, the significance plays a central role in this new method. It is used to split a node in building a tree and to be also the weight contributing to the BDT score. As the systematical uncertainties can be easily included in the significance calculation, this method is able to learn about reducing the effect of the systematical uncertainties via training. Taking the search of the rare radiative Higgs decay in proton-proton collisions pp→h+X→γτ+τ−+Xpp \to h + X \to \gamma\tau^+\tau^-+X as example, QBDT and the popular Gradient BDT (GradBDT) method are compared. QBDT is found to reduce the correlation between the signal strength and systematical uncertainty sources and thus to give a better significance. The contribution to the signal strength uncertainty from the systematical uncertainty sources using the new method is 50-85~\% of that using the GradBDT method.Comment: 29 pages, accepted for publication in NIMA, algorithm available at https://github.com/xialigang/QBD

    Improved Asymptotic Formulae for Statistical Interpretation Based on Likelihood Ratio Tests

    Full text link
    In this work, we improve the asymptotic formulae to describe the probability distribution of a test statistic in G. Cowan \emph{et al.}'s paper~\cite{asimov} from a perspective totally different from last version of this arXiv entry. The starting point of this version seems more natural. The probability distribution function under the hypothesis HH is f(Tμ∣μH)=∑n=0+∞f(Tμ∣n,μH)P(n∣b+μHs)f(T_\mu | \mu_H) = \sum_{n=0}^{+\infty}f(T_\mu|n,\mu_H)P(n|b+\mu_Hs) =∑n=0nsmallf(Tμ∣n,μH)P(n∣b+μHs)+∑n>nsmallf(Tμ∣n,μH)P(n∣b+μHs)= \sum_{n=0}^{n_{\text{small}}}f(T_\mu|n,\mu_H)P(n|b+\mu_Hs) + \sum_{n>n_{\text{small}}}f(T_\mu|n,\mu_H)P(n|b+\mu_Hs) ≈∑n=0nsmallfLS(Tμ∣n,μH)P(n∣b+μHs)+(1−∑n=0nsmallP(n∣b+μHs))fLS(Tμ∣nsmall,μH)\approx \sum_{n=0}^{n_{\text{small}}}f_{\text{LS}}(T_\mu|n,\mu_H)P(n|b+\mu_Hs) + (1-\sum_{n=0}^{n_{\text{small}}}P(n|b+\mu_Hs))f_{\text{LS}}(T_\mu|n_{\text{small}}, \mu_H) \. Here P(n∣ν)P(n|\nu) is Poisson distribution function; nsmalln_{\text{small}} is the boarder between large statistics (LS) and small statistics (SS), and has to be chosen appropriately. If the number of events is greater than nsmalln_{\text{small}}, the probability distribution of TμT_\mu is described by a single function fLSf_{\text{LS}}. fLSf_{\text{LS}} is basically the classic asymptotic formulae with a correction. For each possible number of events not greater than nsmalln_{\text{small}}, we obtain the probability distribution, fSSf_{\text{SS}}, based on a simplifed 6-bin distribution of the observables. fSS(Tμ∣n,μH)=∑k0+k1+k2+k3+k4+k5=nn!k0!k1!⋯k5!Πi=05(bi+μHsib+μHs)ki×fbinned(Tμ∣ni=ki,i=0,1,⋯ ,5;μH)f_{\text{SS}}(T_\mu|n,\mu_H) = \sum_{k_0+k_1+k_2+k_3+k_4+k_5=n}\frac{n!}{k_0!k_1!\cdots k_5!}\Pi_{i=0}^5(\frac{b_i+\mu_Hs_i}{b+ \mu_Hs})^{k_i} \times f_{\text{binned}}(T_\mu|n_i=k_i,i=0,1,\cdots,5;\mu_H) In this way, the bump structures due to small sample size can be well predicted. The new asymptotic formulae provide a much better differential description of the test statistics.Comment: 13 pages, 7 figures, a different perspective, able to describe the discrete feature in small-statistics case
    • …
    corecore