7,662 research outputs found
Recommended from our members
Expert-augmented machine learning.
Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications
Modeling crowdsourcing as collective problem solving
Crowdsourcing is a process of accumulating the ideas, thoughts or information
from many independent participants, with aim to find the best solution for a
given challenge. Modern information technologies allow for massive number of
subjects to be involved in a more or less spontaneous way. Still, the full
potentials of crowdsourcing are yet to be reached. We introduce a modeling
framework through which we study the effectiveness of crowdsourcing in relation
to the level of collectivism in facing the problem. Our findings reveal an
intricate relationship between the number of participants and the difficulty of
the problem, indicating the optimal size of the crowdsourced group. We discuss
our results in the context of modern utilization of crowdsourcing.Comment: 19 pages, 3 figure
Simulating Three-Dimensional Hydrodynamics on a Cellular-Automata Machine
We demonstrate how three-dimensional fluid flow simulations can be carried
out on the Cellular Automata Machine 8 (CAM-8), a special-purpose computer for
cellular-automata computations. The principal algorithmic innovation is the use
of a lattice-gas model with a 16-bit collision operator that is specially
adapted to the machine architecture. It is shown how the collision rules can be
optimized to obtain a low viscosity of the fluid. Predictions of the viscosity
based on a Boltzmann approximation agree well with measurements of the
viscosity made on CAM-8. Several test simulations of flows in simple geometries
-- channels, pipes, and a cubic array of spheres -- are carried out.
Measurements of average flux in these geometries compare well with theoretical
predictions.Comment: 19 pages, REVTeX and epsf macros require
A model-based multithreshold method for subgroup identification
Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study
The use of data-mining for the automatic formation of tactics
This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques
Causal Rule Learning: Enhancing the Understanding of Heterogeneous Treatment Effect via Weighted Causal Rules
Interpretability is a key concern in estimating heterogeneous treatment
effects using machine learning methods, especially for healthcare applications
where high-stake decisions are often made. Inspired by the Predictive,
Descriptive, Relevant framework of interpretability, we propose causal rule
learning which finds a refined set of causal rules characterizing potential
subgroups to estimate and enhance our understanding of heterogeneous treatment
effects. Causal rule learning involves three phases: rule discovery, rule
selection, and rule analysis. In the rule discovery phase, we utilize a causal
forest to generate a pool of causal rules with corresponding subgroup average
treatment effects. The selection phase then employs a D-learning method to
select a subset of these rules to deconstruct individual-level treatment
effects as a linear combination of the subgroup-level effects. This helps to
answer an ignored question by previous literature: what if an individual
simultaneously belongs to multiple groups with different average treatment
effects? The rule analysis phase outlines a detailed procedure to further
analyze each rule in the subset from multiple perspectives, revealing the most
promising rules for further validation. The rules themselves, their
corresponding subgroup treatment effects, and their weights in the linear
combination give us more insights into heterogeneous treatment effects.
Simulation and real-world data analysis demonstrate the superior performance of
causal rule learning on the interpretable estimation of heterogeneous treatment
effect when the ground truth is complex and the sample size is sufficient
- …