65,272 research outputs found

    Interpretable Categorization of Heterogeneous Time Series Data

    Get PDF
    Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data Mining (SDM) 201

    uBoost: A boosting method for producing uniform selection efficiencies from multivariate classifiers

    Get PDF
    The use of multivariate classifiers, especially neural networks and decision trees, has become commonplace in particle physics. Typically, a series of classifiers is trained rather than just one to enhance the performance; this is known as boosting. This paper presents a novel method of boosting that produces a uniform selection efficiency in a user-defined multivariate space. Such a technique is ideally suited for amplitude analyses or other situations where optimizing a single integrated figure of merit is not what is desired

    A Multivariate Training Technique with Event Reweighting

    Get PDF
    An event reweighting technique incorporated in multivariate training algorithm has been developed and tested using the Artificial Neural Networks (ANN) and Boosted Decision Trees (BDT). The event reweighting training are compared to that of the conventional equal event weighting based on the ANN and the BDT performance. The comparison is performed in the context of the physics analysis of the ATLAS experiment at the Large Hadron Collider (LHC), which will explore the fundamental nature of matter and the basic forces that shape our universe. We demonstrate that the event reweighting technique provides an unbiased method of multivariate training for event pattern recognition.Comment: 20 pages, 8 figure

    Application of decision trees and multivariate regression trees in design and optimization

    Get PDF
    Induction of decision trees and regression trees is a powerful technique not only for performing ordinary classification and regression analysis but also for discovering the often complex knowledge which describes the input-output behavior of a learning system in qualitative forms;In the area of classification (discrimination analysis), a new technique called IDea is presented for performing incremental learning with decision trees. It is demonstrated that IDea\u27s incremental learning can greatly reduce the spatial complexity of a given set of training examples. Furthermore, it is shown that this reduction in complexity can also be used as an effective tool for improving the learning efficiency of other types of inductive learners such as standard backpropagation neural networks;In the area of regression analysis, a new methodology for performing multiobjective optimization has been developed. Specifically, we demonstrate that muitiple-objective optimization through induction of multivariate regression trees is a powerful alternative to the conventional vector optimization techniques. Furthermore, in an attempt to investigate the effect of various types of splitting rules on the overall performance of the optimizing system, we present a tree partitioning algorithm which utilizes a number of techniques derived from diverse fields of statistics and fuzzy logic. These include: two multivariate statistical approaches based on dispersion matrices, an information-theoretic measure of covariance complexity which is typically used for obtaining multivariate linear models, two newly-formulated fuzzy splitting rules based on Pearson\u27s parametric and Kendall\u27s nonparametric measures of association, Bellman and Zadeh\u27s fuzzy decision-maximizing approach within an inductive framework, and finally, the multidimensional extension of a widely-used fuzzy entropy measure. The advantages of this new approach to optimization are highlighted by presenting three examples which respectively deal with design of a three-bar truss, a beam, and an electric discharge machining (EDM) process

    Multivariate Analysis of Flow Cytometric Data Using Decision Trees

    Get PDF
    Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called “induction of decision trees” in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees’ quality, we used stratified fivefold cross validation and chose the “best” tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets

    Neural-Symbolic Temporal Decision Trees for Multivariate Time Series Classification

    Get PDF
    Multivariate time series classification is a widely known problem, and its applications are ubiquitous. Due to their strong generalization capability, neural networks have been proven to be very powerful for the task, but their applicability is often limited by their intrinsic black-box nature. Recently, temporal decision trees have been shown to be a serious alternative to neural networks for the same task in terms of classification performances, while attaining higher levels of transparency and interpretability. In this work, we propose an initial approach to neural-symbolic temporal decision trees, that is, an hybrid method that leverages on both the ability of neural networks of capturing temporal patterns and the flexibility of temporal decision trees of taking decisions on intervals based on (possibly, externally computed) temporal features. While based on a proof-of-concept implementation, in our experiments on public datasets, neural-symbolic temporal decision trees show promising results

    Search for WtbˉW'\rightarrow t\bar{b} in the lepton plus jets final states with the ATLAS detector at the LHC

    Full text link
    This document presents a search for a WW' boson, decaying to a top quark and a bb quark in an effective coupling approach, using a multivariate method based on boosted decision trees. It reports exclusion limits on the WtbW'\rightarrow tb cross-section times branching ratio and effective couplings as a function of the WW'-boson mass. The search covers WW'-boson masses between 0.5 and 3.0 TeV, for right-handed or left-handed WW'-boson, with 20.3 fb1^{-1} of proton-proton collision data produced by the LHC in 2012, at a center-of-mass energy of 8 TeV and collected by the ATLAS detector.Comment: TOP2014, 4 pages, 4 figure
    corecore