16,724 research outputs found

    A Primer on Causality in Data Science

    Get PDF
    Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.Comment: 26 pages (with references); 4 figure

    Causality-Based Feature Importance Quantifying Methods: PN-FI, PS-FI and PNS-FI

    Full text link
    In the current ML field models are getting larger and more complex, and data used for model training are also getting larger in quantity and higher in dimensions. Therefore, in order to train better models, and save training time and computational resources, a good Feature Selection (FS) method in the preprocessing stage is necessary. Feature importance (FI) is of great importance since it is the basis of feature selection. Therefore, this paper creatively introduces the calculation of PN (the probability of Necessity), PN (the probability of Sufficiency), and PNS (the probability of Necessity and Sufficiency) of Causality into quantifying feature importance and creates 3 new FI measuring methods, PN-FI, which means how much importance a feature has in image recognition tasks, PS-FI that means how much importance a feature has in image generating tasks, and PNS-FI which measures both. The main body of this paper is three RCTs, with whose results we show how PS-FI, PN-FI, and PNS-FI of 3 features, dog nose, dog eyes, and dog mouth are calculated. The experiments show that firstly, FI values are intervals with tight upper and lower bounds. Secondly, the feature dog eyes has the most importance while the other two have almost the same. Thirdly, the bounds of PNS and PN are tighter than the bounds of PS.Comment: 7 page

    CausalOps -- Towards an Industrial Lifecycle for Causal Probabilistic Graphical Models

    Full text link
    Causal probabilistic graph-based models have gained widespread utility, enabling the modeling of cause-and-effect relationships across diverse domains. With their rising adoption in new areas, such as automotive system safety and machine learning, the need for an integrated lifecycle framework akin to DevOps and MLOps has emerged. Currently, a process reference for organizations interested in employing causal engineering is missing. To address this gap and foster widespread industrial adoption, we propose CausalOps, a novel lifecycle framework for causal model development and application. By defining key entities, dependencies, and intermediate artifacts generated during causal engineering, we establish a consistent vocabulary and workflow model. This work contextualizes causal model usage across different stages and stakeholders, outlining a holistic view of creating and maintaining them. CausalOps' aim is to drive the adoption of causal methods in practical applications within interested organizations and the causality community

    Time to reality check the promises of machine learning-powered precision medicine

    Get PDF
    Machine learning methods, combined with large electronic health databases, could enable a personalised approach to medicine through improved diagnosis and prediction of individual responses to therapies. If successful, this strategy would represent a revolution in clinical research and practice. However, although the vision of individually tailored medicine is alluring, there is a need to distinguish genuine potential from hype. We argue that the goal of personalised medical care faces serious challenges, many of which cannot be addressed through algorithmic complexity, and call for collaboration between traditional methodologists and experts in medical machine learning to avoid extensive research waste

    Big Data Meet ML and AI for Decision Superiority at the Tactical Edge – Algorithm Design, Demonstrate and Concept Model

    Get PDF
    NPS NRP Executive SummaryBig Data Meet ML and AI for Decision Superiority at the Tactical Edge – Algorithm Design, Demonstrate and Concept ModelN2/N6 - Information WarfareThis research is supported by funding from the Naval Postgraduate School, Naval Research Program (PE 0605853N/2098). https://nps.edu/nrpChief of Naval Operations (CNO)Approved for public release. Distribution is unlimited.
    • …
    corecore