Search CORE

3,541 research outputs found

Decision Stream: Cultivating Deep Decision Trees

Author: Ignatov Andrey
Ignatov Dmitry
Publication venue
Publication date: 03/09/2017
Field of study

Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%

arXiv.org e-Print Archive

Crossref

Statistical Classification Techniques for Photometric Supernova Typing

Author: Aldering
Ascasibar
Astier
B. Bassett
B. Martin
Balogh
Bissantz
Carstairs
Clocchiatti
D. Parkinson
de Jager
Eisenstein
Fadda
Filippenko
Folatelli
Freund
Freund
Friedman
Friedman
Friedman
Frieman
Fu
Fukugita
Gerdes
Giannantonio
Guy
H. Campbell
H. Lampeitl
Hastie
Hicken
J. Newling
Kaiser
Kessler
Komatsu
Kunz
Lampeitl
M. Kunz
M. Smith
M. Varughese
Mantz
Oyaizu
Percival
Percival
Perlmutter
R. Hlozek
R. Nichol
Riess
Roe
Schmidt
Tyson
Valtchanov
Wester
Publication venue: 'Wiley'
Publication date: 01/01/2010
Field of study

Future photometric supernova surveys will produce vastly more candidates than can be followed up spectroscopically, highlighting the need for effective classification methods based on lightcurves alone. Here we introduce boosting and kernel density estimation techniques which have minimal astrophysical input, and compare their performance on 20,000 simulated Dark Energy Survey lightcurves. We demonstrate that these methods are comparable to the best template fitting methods currently used, and in particular do not require the redshift of the host galaxy or candidate. However both methods require a training sample that is representative of the full population, so typical spectroscopic supernova subsamples will lead to poor performance. To enable the full potential of such blind methods, we recommend that representative training samples should be used and so specific attention should be given to their creation in the design phase of future photometric surveys.Comment: 19 pages, 41 figures. No changes. Additional material and summary video available at http://cosmoaims.wordpress.com/2010/09/30/boosting-for-supernova-classification

arXiv.org e-Print Archive

CiteSeerX

Crossref

Portsmouth University Research Portal (Pure)

University of Queensland eSpace

How to Find More Supernovae with Less Work: Object Classification Techniques for Difference Imaging

Author: B. A. Weaver
Becker A. C.
C. Aragon
D. Wong
Fisher R. A.
Freund Y.
R. C. Thomas
R. Romano
S. Bailey
Zahn C. T.
Publication venue: 'University of Chicago Press'
Publication date: 02/05/2007
Field of study

We present the results of applying new object classification techniques to difference images in the context of the Nearby Supernova Factory supernova search. Most current supernova searches subtract reference images from new images, identify objects in these difference images, and apply simple threshold cuts on parameters such as statistical significance, shape, and motion to reject objects such as cosmic rays, asteroids, and subtraction artifacts. Although most static objects subtract cleanly, even a very low false positive detection rate can lead to hundreds of non-supernova candidates which must be vetted by human inspection before triggering additional followup. In comparison to simple threshold cuts, more sophisticated methods such as Boosted Decision Trees, Random Forests, and Support Vector Machines provide dramatically better object discrimination. At the Nearby Supernova Factory, we reduced the number of non-supernova candidates by a factor of 10 while increasing our supernova identification efficiency. Methods such as these will be crucial for maintaining a reasonable false positive rate in the automated transient alert pipelines of upcoming projects such as PanSTARRS and LSST.Comment: 25 pages; 6 figures; submitted to Ap

arXiv.org e-Print Archive

Crossref

UNT Digital Library

A Multivariate Training Technique with Event Reweighting

Author: A Hocker
A Wilson
B Zhou
H -J Yang
J Bastos
J Bastos
S Frixione
T Dai
The BABAR collaboration
Y Freund
Z Zhao
Publication venue: 'IOP Publishing'
Publication date: 28/08/2007
Field of study

An event reweighting technique incorporated in multivariate training algorithm has been developed and tested using the Artificial Neural Networks (ANN) and Boosted Decision Trees (BDT). The event reweighting training are compared to that of the conventional equal event weighting based on the ANN and the BDT performance. The comparison is performed in the context of the physics analysis of the ATLAS experiment at the Large Hadron Collider (LHC), which will explore the fundamental nature of matter and the basic forces that shape our universe. We demonstrate that the event reweighting technique provides an unbiased method of multivariate training for event pattern recognition.Comment: 20 pages, 8 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Deep Blue Documents at the University of Michigan

Multi-test Decision Tree and its Application to Microarray Data Classification

Author: Armstrong
Berzal
Breiman
Breiman
Breiman
Brodley
Brown
Brown
Che
Chen
Cohen
Cordell
Cowell
Czajkowski
Demsar
Dettling
Diaz-Uriarte
Dramiński
Fayyad
Freund
Freund
Ge
Golub
Grześ
Hall
Hastie
Hu
Kuo
Li
Marcin Czajkowski
Marek Grześ
Marek Kretowski
Murthy
Murthy
Pagallo
Qu
Quinlan
Robnik-Siikonja
Rokach
Rokach
Sebastiani
Shalev-Shwartz
Shi
Tan
Tan
Wold
Yeoh
Publication venue: 'Elsevier BV'
Publication date: 01/05/2014
Field of study

Objective: The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. Methods: We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Results: Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on

14

datasets by an average

6

percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. Conclusion: This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts

Crossref

Kent Academic Repository

Postponing Branching Decisions

Author: Milano Michela
van Hoeve Willem Jan
Publication venue
Publication date: 01/01/2004
Field of study

Solution techniques for Constraint Satisfaction and Optimisation Problems often make use of backtrack search methods, exploiting variable and value ordering heuristics. In this paper, we propose and analyse a very simple method to apply in case the value ordering heuristic produces ties: postponing the branching decision. To this end, we group together values in a tie, branch on this sub-domain, and defer the decision among them to lower levels of the search tree. We show theoretically and experimentally that this simple modification can dramatically improve the efficiency of the search strategy. Although in practise similar methods may have been applied already, to our knowledge, no empirical or theoretical study has been proposed in the literature to identify when and to what extent this strategy should be used.Comment: 11 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

International Migration, Integration and Social Cohesion online publications