7,662 research outputs found

    Subgroup Discovery with Proper Scoring Rules

    Get PDF

    Modeling crowdsourcing as collective problem solving

    Get PDF
    Crowdsourcing is a process of accumulating the ideas, thoughts or information from many independent participants, with aim to find the best solution for a given challenge. Modern information technologies allow for massive number of subjects to be involved in a more or less spontaneous way. Still, the full potentials of crowdsourcing are yet to be reached. We introduce a modeling framework through which we study the effectiveness of crowdsourcing in relation to the level of collectivism in facing the problem. Our findings reveal an intricate relationship between the number of participants and the difficulty of the problem, indicating the optimal size of the crowdsourced group. We discuss our results in the context of modern utilization of crowdsourcing.Comment: 19 pages, 3 figure

    Simulating Three-Dimensional Hydrodynamics on a Cellular-Automata Machine

    Full text link
    We demonstrate how three-dimensional fluid flow simulations can be carried out on the Cellular Automata Machine 8 (CAM-8), a special-purpose computer for cellular-automata computations. The principal algorithmic innovation is the use of a lattice-gas model with a 16-bit collision operator that is specially adapted to the machine architecture. It is shown how the collision rules can be optimized to obtain a low viscosity of the fluid. Predictions of the viscosity based on a Boltzmann approximation agree well with measurements of the viscosity made on CAM-8. Several test simulations of flows in simple geometries -- channels, pipes, and a cubic array of spheres -- are carried out. Measurements of average flux in these geometries compare well with theoretical predictions.Comment: 19 pages, REVTeX and epsf macros require

    A model-based multithreshold method for subgroup identification

    Get PDF
    Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study

    The use of data-mining for the automatic formation of tactics

    Get PDF
    This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques

    Causal Rule Learning: Enhancing the Understanding of Heterogeneous Treatment Effect via Weighted Causal Rules

    Full text link
    Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. Causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient