Search CORE

3,472 research outputs found

Optimization of Signal Significance by Bagging Decision Trees

Author: Narsky I.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 20/07/2005
Field of study

An algorithm for optimization of signal significance or any other classification figure of merit suited for analysis of high energy physics (HEP) data is described. This algorithm trains decision trees on many bootstrap replicas of training data with each tree required to optimize the signal significance or any other chosen figure of merit. New data are then classified by a simple majority vote of the built trees. The performance of this algorithm has been studied using a search for the radiative leptonic decay B->gamma l nu at BaBar and shown to be superior to that of all other attempted classifiers including such powerful methods as boosted decision trees. In the B->gamma e nu channel, the described algorithm increases the expected signal significance from 2.4 sigma obtained by an original method designed for the B->gamma l nu analysis to 3.0 sigma.Comment: 8 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Caltech Authors

Responder Identification in Clinical Trials with Censored Data

Author: Kehl V.
Ulm Kurt
Publication venue
Publication date: 01/01/2003
Field of study

We present a newly developed technique for identification of positive and negative responders to a new treatment which was compared to a classical treatment (or placebo) in a randomized clinical trial. This bump-hunting-based method was developed for trials in which the two treatment arms do not differ in survival overall. It checks in a systematic manner if certain subgroups, described by predictive factors do show difference in survival due to the new treatment. Several versions of the method were discussed and compared in a simulation study. The best version of the responder identification method employs martingale residuals to a prognostic model as response in a stabilized through bootstrapping bump hunting procedure. On average it recognizes 90% of the time the correct positive responder group and 99% of the time the correct negative responder group

Open Access LMU

Finding Groups in Gene Expression Data

Author: Hand David J.
Heard Nicholas A.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2005
Field of study

The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were introduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks. Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper reviews these various tools for finding interesting subgroups

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

Testing Invisible Momentum Ansatze in Missing Energy Events at the LHC

Author: Kim Doojin
Matchev Konstantin T.
Moortgat Filip
Pape Luc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/03/2017
Field of study

We consider SUSY-like events with two decay chains, each terminating in an invisible particle, whose true energy and momentum are not measured in the detector. Nevertheless, a useful educated guess about the invisible momenta can still be obtained by optimizing a suitable invariant mass function. We review and contrast several proposals in the literature for such ansatze: four versions of the M_T2-assisted on-shell reconstruction (MAOS), as well as several variants of the on-shell constrained M_2 variables. We compare the performance of these methods with regards to the mass determination of a new particle resonance along the decay chain from the peak of the reconstructed invariant mass distribution. For concreteness, we consider the event topology of dilepton ttbar events and study each of the three possible subsystems, in both a ttbar and a SUSY example. We find that the M_2 variables generally provide sharper peaks and therefore better ansatze for the invisible momenta. We show that the performance can be further improved by preselecting events near the kinematic endpoint of the corresponding variable from which the momentum ansatz originates.Comment: 38 pages, 15 figure

arXiv.org e-Print Archive

CERN Document Server

Bump Hunting using the Tree-GA

Author: Hirose Hideo
Publication venue: International Information Institute
Publication date: 01/10/2011
Field of study

The bump hunting is to find the regions where points we are interested in are located more densely than elsewhere and are hardly separable from other points. By specifying a pureness rate p for the points, a maximum capture rate c of the points could be obtained. Then, a trade-off curve between p and c can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. We adopt simpler boundary shapes for the bumps such as the box-shaped regions located parallel to variable axes for convenience. We use the genetic algorithm, specified to the tree structure, called the tree-GA, to obtain the maximum capture rates, because the conventional binary decision tree will not provide the maximum capture rates. Using the tree-GA tendency providing many local maxima for the capture rates, we can estimate the return period for the trade-off curve by using the extremevalue statistics. We have assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases, and found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one

Kyushu Institute of Technology of Academic Repository

Kyutacar : Kyushu Institute of Technology Academic Repository

Bump huntingとその顧客データへの応用

Author: Hirose Hideo
Publication venue: Department of Systems Design and Informatics, Kyushu Institute of Technology
Publication date: 01/01/2009
Field of study

In difficult classification problems of the z-dimensional points into two groups having 0-1 responses due to the messy data structure, it is more favorable to search for the denser regions for the response 1 assigned points than to find the boundaries to separate the two groups. To such problems often seen in customer databases, we have developed a bump hunting method using probabilistic and statistical methods. By specifying a pureness rate in advance, a maximum capture rate will be obtained.Then, a trade-off curve between the pureness rate and the capture rate can be constructed. In finding the maximum capture rate, we have used the decision tree method combined with the genetic algorithm. We first explain a brief introduction of our research: what the bump hunting is, the trade-off curve between the pureness rate and the capture rate, the bump hunting using the tree genetic algorithm, the upper bounds for the trade-off curve using the extreme-value statistics. Then, the assessment for the accuracy of the trade-off curve is tackled from the genetic algorithm procedure viewpoint. Using the new genetic algorithm procedure proposed, we can obtain the upper bound accuracy for the trade-off curve. Then, we may expect the actually attainable trade-off curve upper bound. The bootstrapped hold-out method is used in assessing the accuracy of the trade-off curve, as well as the cross validation method

Kyutacar : Kyushu Institute of Technology Academic Repository

Kyushu Institute of Technology of Academic Repository

Enhancing the discovery prospects for SUSY-like decays with a forgotten kinematic variable

Author: Debnath Dipsikha
Gainer James S.
Kilic Can
Kim Doojin
Matchev Konstantin T.
Yang Yuan-Pao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/09/2018
Field of study

The lack of a new physics signal thus far at the Large Hadron Collider motivates us to consider how to look for challenging final states, with large Standard Model backgrounds and subtle kinematic features, such as cascade decays with compressed spectra. Adopting a benchmark SUSY-like decay topology with a four-body final state proceeding through a sequence of two-body decays via intermediate resonances, we focus our attention on the kinematic variable

\Delta_{4}

which previously has been used to parameterize the boundary of the allowed four-body phase space. We highlight the advantages of using

\Delta_{4}

as a discovery variable, and present an analysis suggesting that the pairing of

\Delta_{4}

with another invariant mass variable leads to a significant improvement over more conventional variable choices and techniques.Comment: 20 pages, 13 figures. v2: matches published versio

arXiv.org e-Print Archive

Directory of Open Access Journals

The University of Arizona

CERN Document Server