3,472 research outputs found

    Optimization of Signal Significance by Bagging Decision Trees

    Get PDF
    An algorithm for optimization of signal significance or any other classification figure of merit suited for analysis of high energy physics (HEP) data is described. This algorithm trains decision trees on many bootstrap replicas of training data with each tree required to optimize the signal significance or any other chosen figure of merit. New data are then classified by a simple majority vote of the built trees. The performance of this algorithm has been studied using a search for the radiative leptonic decay B->gamma l nu at BaBar and shown to be superior to that of all other attempted classifiers including such powerful methods as boosted decision trees. In the B->gamma e nu channel, the described algorithm increases the expected signal significance from 2.4 sigma obtained by an original method designed for the B->gamma l nu analysis to 3.0 sigma.Comment: 8 pages, 2 figures, 1 tabl

    Responder Identification in Clinical Trials with Censored Data

    Get PDF
    We present a newly developed technique for identification of positive and negative responders to a new treatment which was compared to a classical treatment (or placebo) in a randomized clinical trial. This bump-hunting-based method was developed for trials in which the two treatment arms do not differ in survival overall. It checks in a systematic manner if certain subgroups, described by predictive factors do show difference in survival due to the new treatment. Several versions of the method were discussed and compared in a simulation study. The best version of the responder identification method employs martingale residuals to a prognostic model as response in a stabilized through bootstrapping bump hunting procedure. On average it recognizes 90% of the time the correct positive responder group and 99% of the time the correct negative responder group

    Finding Groups in Gene Expression Data

    Get PDF
    The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were introduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks. Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper reviews these various tools for finding interesting subgroups

    Testing Invisible Momentum Ansatze in Missing Energy Events at the LHC

    Full text link
    We consider SUSY-like events with two decay chains, each terminating in an invisible particle, whose true energy and momentum are not measured in the detector. Nevertheless, a useful educated guess about the invisible momenta can still be obtained by optimizing a suitable invariant mass function. We review and contrast several proposals in the literature for such ansatze: four versions of the M_T2-assisted on-shell reconstruction (MAOS), as well as several variants of the on-shell constrained M_2 variables. We compare the performance of these methods with regards to the mass determination of a new particle resonance along the decay chain from the peak of the reconstructed invariant mass distribution. For concreteness, we consider the event topology of dilepton ttbar events and study each of the three possible subsystems, in both a ttbar and a SUSY example. We find that the M_2 variables generally provide sharper peaks and therefore better ansatze for the invisible momenta. We show that the performance can be further improved by preselecting events near the kinematic endpoint of the corresponding variable from which the momentum ansatz originates.Comment: 38 pages, 15 figure

    Bump Hunting using the Tree-GA

    Get PDF
    The bump hunting is to find the regions where points we are interested in are located more densely than elsewhere and are hardly separable from other points. By specifying a pureness rate p for the points, a maximum capture rate c of the points could be obtained. Then, a trade-off curve between p and c can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. We adopt simpler boundary shapes for the bumps such as the box-shaped regions located parallel to variable axes for convenience. We use the genetic algorithm, specified to the tree structure, called the tree-GA, to obtain the maximum capture rates, because the conventional binary decision tree will not provide the maximum capture rates. Using the tree-GA tendency providing many local maxima for the capture rates, we can estimate the return period for the trade-off curve by using the extremevalue statistics. We have assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases, and found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one

    Bump huntingćØćć®é”§å®¢ćƒ‡ćƒ¼ć‚æćø恮åæœē”Ø

    Get PDF
    In difficult classification problems of the z-dimensional points into two groups having 0-1 responses due to the messy data structure, it is more favorable to search for the denser regions for the response 1 assigned points than to find the boundaries to separate the two groups. To such problems often seen in customer databases, we have developed a bump hunting method using probabilistic and statistical methods. By specifying a pureness rate in advance, a maximum capture rate will be obtained.Then, a trade-off curve between the pureness rate and the capture rate can be constructed. In finding the maximum capture rate, we have used the decision tree method combined with the genetic algorithm. We first explain a brief introduction of our research: what the bump hunting is, the trade-off curve between the pureness rate and the capture rate, the bump hunting using the tree genetic algorithm, the upper bounds for the trade-off curve using the extreme-value statistics. Then, the assessment for the accuracy of the trade-off curve is tackled from the genetic algorithm procedure viewpoint. Using the new genetic algorithm procedure proposed, we can obtain the upper bound accuracy for the trade-off curve. Then, we may expect the actually attainable trade-off curve upper bound. The bootstrapped hold-out method is used in assessing the accuracy of the trade-off curve, as well as the cross validation method

    Enhancing the discovery prospects for SUSY-like decays with a forgotten kinematic variable

    Get PDF
    The lack of a new physics signal thus far at the Large Hadron Collider motivates us to consider how to look for challenging final states, with large Standard Model backgrounds and subtle kinematic features, such as cascade decays with compressed spectra. Adopting a benchmark SUSY-like decay topology with a four-body final state proceeding through a sequence of two-body decays via intermediate resonances, we focus our attention on the kinematic variable Ī”4\Delta_{4} which previously has been used to parameterize the boundary of the allowed four-body phase space. We highlight the advantages of using Ī”4\Delta_{4} as a discovery variable, and present an analysis suggesting that the pairing of Ī”4\Delta_{4} with another invariant mass variable leads to a significant improvement over more conventional variable choices and techniques.Comment: 20 pages, 13 figures. v2: matches published versio
    • ā€¦
    corecore