12 research outputs found

    Fast Terrain Classification Using Variable-Length Representation for Autonomous Navigation

    Get PDF
    We propose a method for learning using a set of feature representations which retrieve different amounts of information at different costs. The goal is to create a more efficient terrain classification algorithm which can be used in real-time, onboard an autonomous vehicle. Instead of building a monolithic classifier with uniformly complex representation for each class, the main idea here is to actively consider the labels or misclassification cost while constructing the classifier. For example, some terrain classes might be easily separable from the rest, so very simple representation will be sufficient to learn and detect these classes. This is taken advantage of during learning, so the algorithm automatically builds a variable-length visual representation which varies according to the complexity of the classification task. This enables fast recognition of different terrain types during testing. We also show how to select a set of feature representations so that the desired terrain classification task is accomplished with high accuracy and is at the same time efficient. The proposed approach achieves a good trade-off between recognition performance and speedup on data collected by an autonomous robot

    Comparative analysis of methods for microbiome study

    Get PDF
    Microbiome analysis is garnering much interest with benefits including improved treatment options, enhanced capabilities for personalized medicine, greater understanding of the human body, and contributions to ecological study. Data from these communities of bacteria, viruses, and fungi are feature rich, sparse, and have sample sizes not appreciably larger than the feature space, making analysis challenging and necessitating a coordinated approach utilizing multiple techniques alongside domain expertise. This thesis provides an overview and comparative analysis of these methods, with a case study on cirrhosis and hepatic encephalopathy demonstrating a selection of methods. Approaches are considered in a medically motivated context where relationships between microbes in the human body and diseases or conditions are of primary interest, with additional objectives being the identification of how microbes influence each other and how these influences relate to the diseases and conditions being studied. These analysis methods are partitioned into three categories: univariate statistical methods, classifier-based methods, and joint analysis methods. Univariate statistical methods provide results corresponding to how much a single variable or feature differs between groups in the data. Classifier-based approaches can be generalized as those where a classification model with microbe abundance as inputs and disease states as outputs is used, resulting in a predictive model which is then analyzed to learn about the data. The joint analysis category corresponds to techniques which specifically target relationships between microbes and compare those relationships among subpopulations within the data. Despite significant differences between these categories and the individual methods, each has strengths and weaknesses and plays an important role in microbiome analysis

    Joints in Random Forests

    Get PDF
    Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features

    Using AUC and accuracy in evaluating learning algorithms

    Full text link

    A new approach of top-down induction of decision trees for knowledge discovery

    Get PDF
    Top-down induction of decision trees is the most popular technique for classification in the field of data mining and knowledge discovery. Quinlan developed the basic induction algorithm of decision trees, ID3 (1984), and extended to C4.5 (1993). There is a lot of research work for dealing with a single attribute decision-making node (so-called the first-order decision) of decision trees. Murphy and Pazzani (1991) addressed about multiple-attribute conditions at decision-making nodes. They show that higher order decision-making generates smaller decision trees and better accuracy. However, there always exist NP-complete combinations of multiple-attribute decision-makings.;We develop a new algorithm of second-order decision-tree inductions (SODI) for nominal attributes. The induction rules of first-order decision trees are combined by \u27AND\u27 logic only, but those of SODI consist of \u27AND\u27, \u27OR\u27, and \u27OTHERWISE\u27 logics. It generates more accurate results and smaller decision trees than any first-order decision tree inductions.;Quinlan used information gains via VC-dimension (Vapnik-Chevonenkis; Vapnik, 1995) for clustering the experimental values for each numerical attribute. However, many researchers have discovered the weakness of the use of VC-dim analysis. Bennett (1997) sophistically applies support vector machines (SVM) to decision tree induction. We suggest a heuristic algorithm (SVMM; SVM for Multi-category) that combines a TDIDT scheme with SVM. In this thesis it will be also addressed how to solve multiclass classification problems.;Our final goal for this thesis is IDSS (Induction of Decision Trees using SODI and SVMM). We will address how to combine SODI and SVMM for the construction of top-down induction of decision trees in order to minimize the generalized penalty cost

    Inductive learning of tree-based regression models

    Get PDF
    Dissertação de Doutoramento em Ciência de Computadores apresentada à Faculdade de Ciências da Universidade do PortoEsta tese explora diferentes aspectos da metodologia de indução de árvores de regressão a partir de amostras de dados. O objectivo principal deste estudo é o de melhorar a capacidade predictiva das árvores de regressão tentando manter, tanto quanto possível, a sua compreensibilidade e eficiência computacional. O nosso estudo sobre este tipo de modelos de regressão é dividido em três partes principais.Na primeira parte do estudo são descritas em detalhe duas metodologias para crescer árvores de regressão: uma que minimiza o erro quadrado médio; e outra que minimiza o desvio absoluto médio. A análise que é apresentada concentra-se primordialmente na questão da eficiência computacional do processo de crescimento das árvores. São apresentados diversos algoritmos novos que originam ganhos de eficiência computacional significativos. Por fim, é apresentada uma comparação experimental das duas metodologias alternativas, mostrando claramente os diferentes objectivos práticos de cada uma. A poda das árvores de regressão é um procedimento "standard" neste tipo de metodologias cujo objectivo principal é o de proporcionar um melhor compromisso entre a simplicidade e compreensibilidade das árvores e a sua capacidade predictiva. Na segunda parte desta dissertação são descritas uma série de técnicas novas de poda baseadas num processo de selecção a partir de um conjunto de árvores podadas alternativas. Apresentamos também um conjunto extenso de experiências comparando diferentes métodos de podar árvores de regressão. Os resultados desta comparação, levada a cabo num largo conjunto de problemas, mostram que as nossas técnicas de poda obtêm resultados, em termos de capacidade predictiva, significativamente superiores aos obtidos pelos métodos do actual "estado da arte". Na parte final desta dissertação é apresentado um novo tipo de árvores, que denominamos árvores de regressão locais. Estes modelos híbridos resultam da integração das árvores de regressão com técnicas de modelação ..
    corecore