Search CORE

7,662 research outputs found

Subgroup Discovery with Proper Scoring Rules

Author: Flach Peter
Kalogridis Georgios
Kull Meelis
Song Hao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Explore Bristol Research

Recommended from our members

Expert-augmented machine learning.

Author: Auerbach Andrew
Delgado Elier
Eaton Eric
Friedman Jerome H
Gennatas Efstathios D
Interian Yannet
Luna José Marcio
Pirracchio Romain
Reichmann Lara G
Simone Charles B
Solberg Timothy D
Ungar Lyle H
Valdes Gilmer
van der Laan Mark J
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications

eScholarship - University of California

Modeling crowdsourcing as collective problem solving

Author: Donati Camillo
Guazzini Andrea
Levnajic Zoran
Nardi Annalisa
Vilone Daniele
Publication venue
Publication date: 01/01/2015
Field of study

Crowdsourcing is a process of accumulating the ideas, thoughts or information from many independent participants, with aim to find the best solution for a given challenge. Modern information technologies allow for massive number of subjects to be involved in a more or less spontaneous way. Still, the full potentials of crowdsourcing are yet to be reached. We introduce a modeling framework through which we study the effectiveness of crowdsourcing in relation to the level of collectivism in facing the problem. Our findings reveal an intricate relationship between the number of participants and the difficulty of the problem, indicating the optimal size of the crowdsourced group. We discuss our results in the context of modern utilization of crowdsourcing.Comment: 19 pages, 3 figure

arXiv.org e-Print Archive

Florence Research

PubMed Central

Simulating Three-Dimensional Hydrodynamics on a Cellular-Automata Machine

Author: A. Clouqueur
A. J. C. Ladd
A. S. Sangani
B. Dubrulle
Bruce Boghosian
C. Appert
Christopher Adler
D. d'Humières
D. H. Rothman
D. H. Rothman
D. H. Rothman
Daniel H. Rothman
Eirik G. Flekkøy
G. Weisbuch
H. Hasimoto
I. Ginzbourg
J. A. Somers
J. Hardy
M. Hoef Van der
M. Hénon
M. Hénon
M. Hénon
N. Margolus
N. Margolus
N. Margolus
Norman Margolus
R. Cornubert
T. Toffoli
U. Frisch
U. Frisch
V. Coevorden
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/08/1995
Field of study

We demonstrate how three-dimensional fluid flow simulations can be carried out on the Cellular Automata Machine 8 (CAM-8), a special-purpose computer for cellular-automata computations. The principal algorithmic innovation is the use of a lattice-gas model with a 16-bit collision operator that is specially adapted to the machine architecture. It is shown how the collision rules can be optimized to obtain a low viscosity of the fluid. Predictions of the viscosity based on a Boltzmann approximation agree well with measurements of the viscosity made on CAM-8. Several test simulations of flows in simple geometries -- channels, pipes, and a cubic array of spheres -- are carried out. Measurements of average flux in these geometries compare well with theoretical predictions.Comment: 19 pages, REVTeX and epsf macros require

arXiv.org e-Print Archive

Crossref

A model-based multithreshold method for subgroup identification

Author: Anderson TW
Breiman L
Giri NC
Golub GH
Jolliffe IT
Loh WY
Messenger R
Paul D
Rao CR
Su X
Thomson GH
Publication venue: eScholarship, University of California
Publication date: 11/02/2019
Field of study

Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study

Crossref

eScholarship - University of California

The use of data-mining for the automatic formation of tactics

Author: Bundy A.
Duncan H.
Levine J.
Pollet M.
Storkey A.
Publication venue
Publication date: 01/07/2004
Field of study

This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques

University of Strathclyde Institutional Repository

Causal Rule Learning: Enhancing the Understanding of Heterogeneous Treatment Effect via Weighted Causal Rules

Author: Chang Xiangyu
Liu Hanzhong
Ren Kai
Wu Ying
Publication venue
Publication date: 10/10/2023
Field of study

Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. Causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient

arXiv.org e-Print Archive