3,472 research outputs found
Optimization of Signal Significance by Bagging Decision Trees
An algorithm for optimization of signal significance or any other
classification figure of merit suited for analysis of high energy physics (HEP)
data is described. This algorithm trains decision trees on many bootstrap
replicas of training data with each tree required to optimize the signal
significance or any other chosen figure of merit. New data are then classified
by a simple majority vote of the built trees. The performance of this algorithm
has been studied using a search for the radiative leptonic decay B->gamma l nu
at BaBar and shown to be superior to that of all other attempted classifiers
including such powerful methods as boosted decision trees. In the B->gamma e nu
channel, the described algorithm increases the expected signal significance
from 2.4 sigma obtained by an original method designed for the B->gamma l nu
analysis to 3.0 sigma.Comment: 8 pages, 2 figures, 1 tabl
Responder Identification in Clinical Trials with Censored Data
We present a newly developed technique for identification of positive and negative responders to a new treatment which was compared to a classical treatment (or placebo) in a randomized clinical trial. This bump-hunting-based method was developed for trials in which the two treatment arms do not differ in survival overall. It checks in a systematic manner if certain subgroups, described by predictive factors do show difference in survival due to the new treatment. Several versions of the method were discussed and compared in a simulation study. The best version of the responder identification method employs martingale residuals to a prognostic model as response in a stabilized through bootstrapping bump hunting procedure. On average it recognizes 90% of the time the correct positive responder group and 99% of the time the correct negative responder group
Finding Groups in Gene Expression Data
The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were introduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks. Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper reviews these various tools for finding interesting subgroups
Testing Invisible Momentum Ansatze in Missing Energy Events at the LHC
We consider SUSY-like events with two decay chains, each terminating in an
invisible particle, whose true energy and momentum are not measured in the
detector. Nevertheless, a useful educated guess about the invisible momenta can
still be obtained by optimizing a suitable invariant mass function. We review
and contrast several proposals in the literature for such ansatze: four
versions of the M_T2-assisted on-shell reconstruction (MAOS), as well as
several variants of the on-shell constrained M_2 variables. We compare the
performance of these methods with regards to the mass determination of a new
particle resonance along the decay chain from the peak of the reconstructed
invariant mass distribution. For concreteness, we consider the event topology
of dilepton ttbar events and study each of the three possible subsystems, in
both a ttbar and a SUSY example. We find that the M_2 variables generally
provide sharper peaks and therefore better ansatze for the invisible momenta.
We show that the performance can be further improved by preselecting events
near the kinematic endpoint of the corresponding variable from which the
momentum ansatz originates.Comment: 38 pages, 15 figure
Bump Hunting using the Tree-GA
The bump hunting is to find the regions where points we are interested in are located more densely than elsewhere and are hardly separable from other points. By specifying a pureness rate p for the points, a maximum capture rate c of the points could be obtained. Then, a trade-off curve between p and c can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. We adopt simpler boundary shapes for the bumps such as the box-shaped regions located parallel to variable axes for convenience. We use the genetic algorithm, specified to the tree structure, called the tree-GA, to obtain the maximum capture rates, because the conventional binary decision tree will not provide the maximum capture rates.
Using the tree-GA tendency providing many local maxima for the capture rates, we can estimate the return period for the trade-off curve by using the extremevalue statistics. We have assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases, and found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one
Bump huntingćØćć®é”§å®¢ćć¼ćæćøć®åæēØ
In difficult classification problems of the z-dimensional points into two groups having 0-1 responses due to the messy data structure, it is more favorable to search for the denser regions for the response 1 assigned points than to find the boundaries to separate the two groups. To such problems often seen in customer databases, we have developed a bump hunting method using probabilistic and statistical methods. By specifying a pureness rate in advance, a maximum capture rate will be obtained.Then, a trade-off curve between the pureness rate and the capture rate can be constructed. In finding the maximum capture rate, we have used the decision tree method combined with the genetic algorithm. We first explain a brief introduction of our research: what the bump hunting is, the trade-off curve between the pureness rate and the capture rate, the bump hunting using the tree genetic algorithm, the upper bounds for the trade-off curve using the extreme-value statistics. Then, the assessment for the accuracy of the trade-off curve is tackled from the genetic algorithm procedure viewpoint. Using the new genetic algorithm procedure proposed, we can obtain the upper bound accuracy for the trade-off curve. Then, we may expect the actually attainable trade-off curve upper bound. The bootstrapped hold-out method is used in assessing the accuracy of the trade-off curve, as well as the cross validation method
Enhancing the discovery prospects for SUSY-like decays with a forgotten kinematic variable
The lack of a new physics signal thus far at the Large Hadron Collider
motivates us to consider how to look for challenging final states, with large
Standard Model backgrounds and subtle kinematic features, such as cascade
decays with compressed spectra. Adopting a benchmark SUSY-like decay topology
with a four-body final state proceeding through a sequence of two-body decays
via intermediate resonances, we focus our attention on the kinematic variable
which previously has been used to parameterize the boundary of the
allowed four-body phase space. We highlight the advantages of using
as a discovery variable, and present an analysis suggesting that
the pairing of with another invariant mass variable leads to a
significant improvement over more conventional variable choices and techniques.Comment: 20 pages, 13 figures. v2: matches published versio
- ā¦