35,704 research outputs found

    Optimization of Signal Significance by Bagging Decision Trees

    Get PDF
    An algorithm for optimization of signal significance or any other classification figure of merit suited for analysis of high energy physics (HEP) data is described. This algorithm trains decision trees on many bootstrap replicas of training data with each tree required to optimize the signal significance or any other chosen figure of merit. New data are then classified by a simple majority vote of the built trees. The performance of this algorithm has been studied using a search for the radiative leptonic decay B->gamma l nu at BaBar and shown to be superior to that of all other attempted classifiers including such powerful methods as boosted decision trees. In the B->gamma e nu channel, the described algorithm increases the expected signal significance from 2.4 sigma obtained by an original method designed for the B->gamma l nu analysis to 3.0 sigma.Comment: 8 pages, 2 figures, 1 tabl

    Colour consistency in computer vision : a multiple image dynamic exposure colour classification system : a thesis presented to the Institute of Natural and Mathematical Sciences in fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Albany, Auckland, New Zealand

    Get PDF
    Colour classification vision systems face difficulty when a scene contains both very bright and dark regions. An indistinguishable colour at one exposure may be distinguishable at another. The use of multiple cameras with varying levels of sensitivity is explored in this thesis, aiding the classification of colours in scenes with high illumination ranges. Titled the Multiple Image Dynamic Exposure Colour Classification (MIDECC) System, pie-slice classifiers are optimised for normalised red/green and cyan/magenta colour spaces. The MIDECC system finds a limited section of hyperspace for each classifier, resulting in a process which requires minimal manual input with the ability to filter background samples without specialised training. In experimental implementation, automatic multiple-camera exposure, data sampling, training and colour space evaluation to recognise 8 target colours across 14 different lighting scenarios is processed in approximately 30 seconds. The system provides computationally effective training and classification, outputting an overall true positive score of 92.4% with an illumination range between bright and dim regions of 880 lux. False positive classifications are minimised to 4.24%, assisted by heuristic background filtering. The limited search space classifiers and layout of the colour spaces ensures the MIDECC system is less likely to classify dissimilar colours, requiring a certain ‘confidence’ level before a match is outputted. Unfortunately the system struggles to classify colours under extremely bright illumination due to the simplistic classification building technique. Results are compared to the common machine learning algorithms Naïve Bayes, Neural Networks, Random Tree and C4.5 Tree Classifiers. These algorithms return greater than 98.5% true positives and less than 1.53% false positives, with Random Tree and Naïve Bayes providing the best and worst comparable algorithms, respectively. Although resulting in a lower classification rate, the MIDECC system trains with minimal user input, ignores background and untrained samples when classifying and trains faster than most of the studied machine learning algorithms.Colour classification vision systems face difficulty when a scene contains both very bright and dark regions. An indistinguishable colour at one exposure may be distinguishable at another. The use of multiple cameras with varying levels of sensitivity is explored in this thesis, aiding the classification of colours in scenes with high illumination ranges. Titled the Multiple Image Dynamic Exposure Colour Classification (MIDECC) System, pie-slice classifiers are optimised for normalised red/green and cyan/magenta colour spaces. The MIDECC system finds a limited section of hyperspace for each classifier, resulting in a process which requires minimal manual input with the ability to filter background samples without specialised training. In experimental implementation, automatic multiple-camera exposure, data sampling, training and colour space evaluation to recognise 8 target colours across 14 different lighting scenarios is processed in approximately 30 seconds. The system provides computationally effective training and classification, outputting an overall true positive score of 92.4% with an illumination range between bright and dim regions of 880 lux. False positive classifications are minimised to 4.24%, assisted by heuristic background filtering. The limited search space classifiers and layout of the colour spaces ensures the MIDECC system is less likely to classify dissimilar colours, requiring a certain ‘confidence’ level before a match is outputted. Unfortunately the system struggles to classify colours under extremely bright illumination due to the simplistic classification building technique. Results are compared to the common machine learning algorithms Naïve Bayes, Neural Networks, Random Tree and C4.5 Tree Classifiers. These algorithms return greater than 98.5% true positives and less than 1.53% false positives, with Random Tree and Naïve Bayes providing the best and worst comparable algorithms, respectively. Although resulting in a lower classification rate, the MIDECC system trains with minimal user input, ignores background and untrained samples when classifying and trains faster than most of the studied machine learning algorithms

    Imbalanced Ensemble Classifier for learning from imbalanced business school data set

    Full text link
    Private business schools in India face a common problem of selecting quality students for their MBA programs to achieve the desired placement percentage. Generally, such data sets are biased towards one class, i.e., imbalanced in nature. And learning from the imbalanced dataset is a difficult proposition. This paper proposes an imbalanced ensemble classifier which can handle the imbalanced nature of the dataset and achieves higher accuracy in case of the feature selection (selection of important characteristics of students) cum classification problem (prediction of placements based on the students' characteristics) for Indian business school dataset. The optimal value of an important model parameter is found. Numerical evidence is also provided using Indian business school dataset to assess the outstanding performance of the proposed classifier

    Automated construction of a hierarchy of self-organized neural network classifiers

    Full text link
    This paper documents an effort to design and implement a neural network-based, automatic classification system which dynamically constructs and trains a decision tree. The system is a combination of neural network and decision tree technology. The decision tree is constructed to partition a large classification problem into smaller problems. The neural network modules then solve these smaller problems. We used a variant of the Fuzzy ARTMAP neural network which can be trained much more quickly than traditional neural networks. The research extends the concept of self-organization from within the neural network to the overall structure of the dynamically constructed decision hierarchy. The primary advantage is avoidance of manual tedium and subjective bias in constructing decision hierarchies. Additionally, removing the need for manual construction of the hierarchy opens up a large class of potential classification applications. When tested on data from real-world images, the automatically generated hierarchies performed slightly better than an intuitive (handbuilt) hierarchy. Because the neural networks at the nodes of the decision hierarchy are solving smaller problems, generalization performance can really be improved if the number of features used to solve these problems is reduced. Algorithms for automatically selecting which features to use for each individual classification module were also implemented. We were able to achieve the same level of performance as in previous manual efforts, but in an efficient, automatic manner. The technology developed has great potential in a number of commercial areas, including data mining, pattern recognition, and intelligent interfaces for personal computer applications. Sample applications include: fraud detection, bankruptcy prediction, data mining agent, scalable object recognition system, email agent, resource librarian agent, and a decision aid agent
    • …
    corecore