Search CORE

12 research outputs found

Fast Terrain Classification Using Variable-Length Representation for Autonomous Navigation

Author: Angelova Anelia
Helmick Daniel
Matthies Larry
Perona Pietro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We propose a method for learning using a set of feature representations which retrieve different amounts of information at different costs. The goal is to create a more efficient terrain classification algorithm which can be used in real-time, onboard an autonomous vehicle. Instead of building a monolithic classifier with uniformly complex representation for each class, the main idea here is to actively consider the labels or misclassification cost while constructing the classifier. For example, some terrain classes might be easily separable from the rest, so very simple representation will be sufficient to learn and detect these classes. This is taken advantage of during learning, so the algorithm automatically builds a variable-length visual representation which varies according to the complexity of the classification task. This enables fast recognition of different terrain types during testing. We also show how to select a set of feature representations so that the desired terrain classification task is accomplished with high accuracy and is at the same time efficient. The proposed approach achieves a good trade-off between recognition performance and speedup on data collected by an autonomous robot

Crossref

Caltech Authors

Comparative analysis of methods for microbiome study

Author: Iyer Mihir Vishwanath
Publication venue
Publication date: 01/08/2020
Field of study

Microbiome analysis is garnering much interest with benefits including improved treatment options, enhanced capabilities for personalized medicine, greater understanding of the human body, and contributions to ecological study. Data from these communities of bacteria, viruses, and fungi are feature rich, sparse, and have sample sizes not appreciably larger than the feature space, making analysis challenging and necessitating a coordinated approach utilizing multiple techniques alongside domain expertise. This thesis provides an overview and comparative analysis of these methods, with a case study on cirrhosis and hepatic encephalopathy demonstrating a selection of methods. Approaches are considered in a medically motivated context where relationships between microbes in the human body and diseases or conditions are of primary interest, with additional objectives being the identification of how microbes influence each other and how these influences relate to the diseases and conditions being studied. These analysis methods are partitioned into three categories: univariate statistical methods, classifier-based methods, and joint analysis methods. Univariate statistical methods provide results corresponding to how much a single variable or feature differs between groups in the data. Classifier-based approaches can be generalized as those where a classification model with microbe abundance as inputs and disease states as outputs is used, resulting in a predictive model which is then analyzed to learn about the data. The joint analysis category corresponds to techniques which specifically target relationships between microbes and compare those relationships among subpopulations within the data. Despite significant differences between these categories and the individual methods, each has strengths and weaknesses and plays an important role in microbiome analysis

Illinois Digital Environment for Access to Learning and Scholarship Repository

Joints in Random Forests

Author: Correia Alvaro H. C.
de Campos Cassio
Peharz Robert
Publication venue
Publication date: 01/01/2020
Field of study

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features

arXiv.org e-Print Archive

Pure OAI Repository

Using AUC and accuracy in evaluating learning algorithms

Author: C.X. Ling
Jin Huang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Retrofitting Decision Tree Classifiers Using Kernel Density Estimation

Author: Breiman
Buntine
Buntine
Cacoullos
Friedman
Hand
Izenmann
Lauder
Quinlan
Scott
Silverman
Publication venue: 'Elsevier BV'
Publication date: 01/01/1995
Field of study

Crossref

Retrofitting Decision Tree Classifiers Using Kernel Density Estimation

Author
Publication venue
Publication date
Field of study

BEACON eSPACE

A new approach of top-down induction of decision trees for knowledge discovery

Author: Lee Jun-Youl
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2008
Field of study

Top-down induction of decision trees is the most popular technique for classification in the field of data mining and knowledge discovery. Quinlan developed the basic induction algorithm of decision trees, ID3 (1984), and extended to C4.5 (1993). There is a lot of research work for dealing with a single attribute decision-making node (so-called the first-order decision) of decision trees. Murphy and Pazzani (1991) addressed about multiple-attribute conditions at decision-making nodes. They show that higher order decision-making generates smaller decision trees and better accuracy. However, there always exist NP-complete combinations of multiple-attribute decision-makings.;We develop a new algorithm of second-order decision-tree inductions (SODI) for nominal attributes. The induction rules of first-order decision trees are combined by \u27AND\u27 logic only, but those of SODI consist of \u27AND\u27, \u27OR\u27, and \u27OTHERWISE\u27 logics. It generates more accurate results and smaller decision trees than any first-order decision tree inductions.;Quinlan used information gains via VC-dimension (Vapnik-Chevonenkis; Vapnik, 1995) for clustering the experimental values for each numerical attribute. However, many researchers have discovered the weakness of the use of VC-dim analysis. Bennett (1997) sophistically applies support vector machines (SVM) to decision tree induction. We suggest a heuristic algorithm (SVMM; SVM for Multi-category) that combines a TDIDT scheme with SVM. In this thesis it will be also addressed how to solve multiclass classification problems.;Our final goal for this thesis is IDSS (Induction of Decision Trees using SODI and SVMM). We will address how to combine SODI and SVMM for the construction of top-down induction of decision trees in order to minimize the generalized penalty cost

Digital Repository @ Iowa State University (ISU)

ProQuest OAI Repository

Recommended from our members

Methods for cost-sensitive learning

Author: Margineantu Dragos D. (Dragos Dorin)
Publication venue: 'Oregon State University'
Publication date
Field of study

Many approaches for achieving intelligent behavior of automated (computer) systems involve components that learn from past experience. This dissertation studies computational methods for learning from examples, for classification and for decision making, when the decisions have different non-zero costs associated with them. Many practical applications of learning algorithms, including transaction monitoring, fraud detection, intrusion detection, and medical diagnosis, have such non-uniform costs, and there is a great need for new methods that can handle them. This dissertation discusses two approaches to cost-sensitive classification: input data weighting and conditional density estimation. The first method assigns a weight to each training example in order to force the learning algorithm (which is otherwise unchanged) to pay more attention to examples with higher misclassification costs. The dissertation discusses several different weighting methods and concludes that a method that gives higher weight to examples from rarer classes works quite well. Another algorithm that gave good results was a wrapper method that applies Powell's gradient-free algorithm to optimize the input weights. The second approach to cost-sensitive classification is conditional density estimation. In this approach, the output of the learning algorithm is a classifier that estimates, for a new data point, the probability that it belongs to each of the classes. These probability estimates can be combined with a cost matrix to make decisions that minimize the expected cost. The dissertation presents a new algorithm, bagged lazy option trees (B-LOTs), that gives better probability estimates than any previous method based on decision trees. In order to evaluate cost-sensitive classification methods, appropriate statistical methods are needed. The dissertation presents two new statistical procedures: BLOTs provides a confidence interval on the expected cost of a classifier, and BDELTACOST provides a confidence interval on the difference in expected costs of two classifiers. These methods are applied to a large set of experimental studies to evaluate and compare the cost-sensitive methods presented in this dissertation. Finally, the dissertation describes the application of the B-LOTs to a problem of predicting the stability of river channels. In this study, B-LOTs were shown to be superior to other methods in cases where the classes have very different frequencies a situation that arises frequently in cost-sensitive classification problems

ScholarsArchive@OSU

Inductive learning of tree-based regression models

Author: Torgo Luís Fernando Raínho Alves
Publication venue: Universidade do Porto. Reitoria
Publication date: 01/01/1999
Field of study

Dissertação de Doutoramento em Ciência de Computadores apresentada à Faculdade de Ciências da Universidade do PortoEsta tese explora diferentes aspectos da metodologia de indução de árvores de regressão a partir de amostras de dados. O objectivo principal deste estudo é o de melhorar a capacidade predictiva das árvores de regressão tentando manter, tanto quanto possível, a sua compreensibilidade e eficiência computacional. O nosso estudo sobre este tipo de modelos de regressão é dividido em três partes principais.Na primeira parte do estudo são descritas em detalhe duas metodologias para crescer árvores de regressão: uma que minimiza o erro quadrado médio; e outra que minimiza o desvio absoluto médio. A análise que é apresentada concentra-se primordialmente na questão da eficiência computacional do processo de crescimento das árvores. São apresentados diversos algoritmos novos que originam ganhos de eficiência computacional significativos. Por fim, é apresentada uma comparação experimental das duas metodologias alternativas, mostrando claramente os diferentes objectivos práticos de cada uma. A poda das árvores de regressão é um procedimento "standard" neste tipo de metodologias cujo objectivo principal é o de proporcionar um melhor compromisso entre a simplicidade e compreensibilidade das árvores e a sua capacidade predictiva. Na segunda parte desta dissertação são descritas uma série de técnicas novas de poda baseadas num processo de selecção a partir de um conjunto de árvores podadas alternativas. Apresentamos também um conjunto extenso de experiências comparando diferentes métodos de podar árvores de regressão. Os resultados desta comparação, levada a cabo num largo conjunto de problemas, mostram que as nossas técnicas de poda obtêm resultados, em termos de capacidade predictiva, significativamente superiores aos obtidos pelos métodos do actual "estado da arte". Na parte final desta dissertação é apresentado um novo tipo de árvores, que denominamos árvores de regressão locais. Estes modelos híbridos resultam da integração das árvores de regressão com técnicas de modelação ..

Repositório Aberto da Universidade do Porto