20 research outputs found
Recommended from our members
Building more accurate decision trees with the additive tree.
The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches
Recommended from our members
Expert-augmented machine learning.
Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications
Processing Oscillatory Data with PDV
Author Institution: Lawrence Livermore National LaboratorySlides presented at the 2nd Annual Photonic Doppler Velocimetry (PDV) Workshop held at Lawrence Livermore National Laboratory, Livermore, California, August 16-17, 2007
Long-term variability of proglacial groundwater-fed hydrological systems in an area of glacier retreat, Skeioararsandur, Iceland
Proglacial groundwater‐fed features, such as seeps, substantially impact proglacial geomorphology, hydrology, and ecology. However, there is a paucity of research on the impacts of climate change and glacier retreat on the extent of these important features. This paper aims to investigate the impact of glacier retreat on proglacial groundwater levels and on the extent of groundwater‐fed seeps. Research has taken place in western Skeiðarársandur, the large proglacial outwash plain of Skeiðarárjökull, a retreating temperate glacier in southeast Iceland. Changes in the extent of proglacial groundwater seeps were mapped using historical aerial photographs from 1986, 1997, and 2012. Proglacial groundwater levels were monitored in shallow boreholes between 2000 and 2012. The western margin of Skeiðarárjökull has retreated approximately 1 km beyond its position in 1986. However, this retreat was punctuated by short periods of readvance. The geomorphology and groundwater systems at the site were substantially impacted by the November 1996 jökulhlaup, whose deposits altered approximately 18% of the area of groundwater seeps. The surface areas of groundwater seeps and lakes in the study area have declined by ~97% between 1986 and 2012. Most of the decline took place after 1997, when the mean annual rate of retreat increased three‐fold. Groundwater levels also declined substantially between 2000 and 2012, although this trend varies spatially. The paper provides a conceptual model of the controls on proglacial shallow groundwater systems. Direct impacts of glacier retreat are suggested as the main cause for the declines in proglacial groundwater levels and in the extent of groundwater seeps. These declines are expected to adversely impact sandur ecology
High-Throughput High-Resolution Class I HLA Genotyping in East Africa
HLA, the most genetically diverse loci in the human genome, play a crucial role in host-pathogen interaction by mediating innate and adaptive cellular immune responses. A vast number of infectious diseases affect East Africa, including HIV/AIDS, malaria, and tuberculosis, but the HLA genetic diversity in this region remains incompletely described. This is a major obstacle for the design and evaluation of preventive vaccines. Available HLA typing techniques, that provide the 4-digit level resolution needed to interpret immune responses, lack sufficient throughput for large immunoepidemiological studies. Here we present a novel HLA typing assay bridging the gap between high resolution and high throughput. The assay is based on real-time PCR using sequence-specific primers (SSP) and can genotype carriers of the 49 most common East African class I HLA-A, -B, and -C alleles, at the 4-digit level. Using a validation panel of 175 samples from Kampala, Uganda, previously defined by sequence-based typing, the new assay performed with 100% sensitivity and specificity. The assay was also implemented to define the HLA genetic complexity of a previously uncharacterized Tanzanian population, demonstrating its inclusion in the major East African genetic cluster. The availability of genotyping tools with this capacity will be extremely useful in the identification of correlates of immune protection and the evaluation of candidate vaccine efficacy
Recommended from our members
Reply to Nock and Nielsen: On the work of Nock and Nielsen and its relationship to the additive tree.
Recommended from our members
Expert-augmented machine learning.
Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications