Search CORE

1,642 research outputs found

Node harvest

Author: Meinshausen Nicolai
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like Random Forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as `black box' predictors. Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a new observation falls into just a single node, its prediction is the mean response of all training observation within this node, identical to a tree-like prediction. A new observation falls typically into several nodes and its prediction is then the weighted average of the mean responses across all these nodes. The only role of node harvest is to `pick' the right nodes from the initial large ensemble of nodes by choosing node weights, which amounts in the proposed algorithm to a quadratic programming problem with linear inequality constraints. The solution is sparse in the sense that only very few nodes are selected with a nonzero weight. This sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to select a tuning parameter for optimal predictive accuracy. Node harvest can handle mixed data and missing values and is shown to be simple to interpret and competitive in predictive accuracy on a variety of data sets.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS367 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring

Author: Aasmets
Alneberg
Bathaee
Belk
Belkina
Biau
Bishop
Bogart
Bolyen
Bradley
Breiman
Buttigieg
Callahan
Caporaso
Caporaso
Carvalho
Chang
Chong
Cordier
Cordier
Delgado-Baquerizo
Demergasso
Deng
Durack
Edgar
Fellous
Fiannaca
Fisher
Forgy
Frühe
Gerhard
Ghannam
Gloor
Goldstein
Gulli
Hampton-Marcell
Hastie
Hazen
Hoerl
Hothorn
Janßen
Johnson
Khodakova
Knights
Kobak
Kuhn
Lane
Larsen
LeCun
Ley
Liu
Love
Lozupone
Lozupone
Lvd
Metcalf
Mittelstadt
Molnar
Montavon
Mukaka
Netzer
O'Brien
Oudah
Pasolli
Pasolli
Paulson
Pedregosa
Preheim
Probst
Qu
Ramette
Reese
Richardson
Rudin
Sathya
Schloss
See
Shamsaddini
Silva
Smith
Sogin
Soman
Stahl
Sunagawa
Suykens
Sze
Techtmann
Thompson
Thompson
Topçuoğlu
Turnbaugh
Turnbaugh
Ulrich
Vamathevan
Van Rossum
Vangay
Vrolix
Wang
Wirbel
Wu
Wu
Xu
Zeevi
Zerilli
Zhao
Zhou
Zitnik
Økland
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2021
Field of study

Advances in nucleic acid sequencing technology have enabled expansion of our ability to profile microbial diversity. These large datasets of taxonomic and functional diversity are key to better understanding microbial ecology. Machine learning has proven to be a useful approach for analyzing microbial community data and making predictions about outcomes including human and environmental health. Machine learning applied to microbial community profiles has been used to predict disease states in human health, environmental quality and presence of contamination in the environment, and as trace evidence in forensics. Machine learning has appeal as a powerful tool that can provide deep insights into microbial communities and identify patterns in microbial community data. However, often machine learning models can be used as black boxes to predict a specific outcome, with little understanding of how the models arrived at predictions. Complex machine learning algorithms often may value higher accuracy and performance at the sacrifice of interpretability. In order to leverage machine learning into more translational research related to the microbiome and strengthen our ability to extract meaningful biological information, it is important for models to be interpretable. Here we review current trends in machine learning applications in microbial ecology as well as some of the important challenges and opportunities for more broad application of machine learning to understanding microbial communities

Michigan Technological University

Crossref

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

Author: Letham Benjamin
Madigan David
McCormick Tyler H.
Rudin Cynthia
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 10/05/2018
Field of study

We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if … then. . . statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS₂ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS₂, but more accurate.National Science Foundation (U.S.) (Grant IIS-1053407

DSpace@MIT