Search CORE

16,724 research outputs found

A Primer on Causality in Data Science

Author: Balzer Laura B.
Saddiki Hachem
Publication venue
Publication date: 04/03/2019
Field of study

Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner.Comment: 26 pages (with references); 4 figure

arXiv.org e-Print Archive

Numérisation de Documents Anciens Mathématiques

Causality-Based Feature Importance Quantifying Methods: PN-FI, PS-FI and PNS-FI

Author: Du Changyi
Du Shuxian
Sun Yaxiu
Publication venue
Publication date: 18/09/2023
Field of study

In the current ML field models are getting larger and more complex, and data used for model training are also getting larger in quantity and higher in dimensions. Therefore, in order to train better models, and save training time and computational resources, a good Feature Selection (FS) method in the preprocessing stage is necessary. Feature importance (FI) is of great importance since it is the basis of feature selection. Therefore, this paper creatively introduces the calculation of PN (the probability of Necessity), PN (the probability of Sufficiency), and PNS (the probability of Necessity and Sufficiency) of Causality into quantifying feature importance and creates 3 new FI measuring methods, PN-FI, which means how much importance a feature has in image recognition tasks, PS-FI that means how much importance a feature has in image generating tasks, and PNS-FI which measures both. The main body of this paper is three RCTs, with whose results we show how PS-FI, PN-FI, and PNS-FI of 3 features, dog nose, dog eyes, and dog mouth are calculated. The experiments show that firstly, FI values are intervals with tight upper and lower bounds. Secondly, the feature dog eyes has the most importance while the other two have almost the same. Thirdly, the bounds of PNS and PN are tighter than the bounds of PS.Comment: 7 page

arXiv.org e-Print Archive

CausalOps -- Towards an Industrial Lifecycle for Causal Probabilistic Graphical Models

Author: Guess Thomas
Maier Robert
Mottok Jürgen
Schlattl Andreas
Publication venue
Publication date: 05/09/2023
Field of study

Causal probabilistic graph-based models have gained widespread utility, enabling the modeling of cause-and-effect relationships across diverse domains. With their rising adoption in new areas, such as automotive system safety and machine learning, the need for an integrated lifecycle framework akin to DevOps and MLOps has emerged. Currently, a process reference for organizations interested in employing causal engineering is missing. To address this gap and foster widespread industrial adoption, we propose CausalOps, a novel lifecycle framework for causal model development and application. By defining key entities, dependencies, and intermediate artifacts generated during causal engineering, we establish a consistent vocabulary and workflow model. This work contextualizes causal model usage across different stages and stakeholders, outlining a holistic view of creating and maintaining them. CausalOps' aim is to drive the adoption of causal methods in practical applications within interested organizations and the causality community

arXiv.org e-Print Archive

Time to reality check the promises of machine learning-powered precision medicine

Author: Abràmoff
Angrist
Antoniou
Arnold
Beede
Bien
Burke
Damen
Davis
Gallagher
Gombar
Greenland
Hannun
Hernán
Hernán
Kent
Liu
Macklon
Matheny
Nagendran
Pearson
Peters
Peto
Rajpurkar
Rothman
Senn
Shmueli
Sundström
van Klaveren
van Smeden
Volkmann
Vollmer
Zhang
Publication venue: 'Elsevier BV'
Publication date: 16/09/2020
Field of study

Machine learning methods, combined with large electronic health databases, could enable a personalised approach to medicine through improved diagnosis and prediction of individual responses to therapies. If successful, this strategy would represent a revolution in clinical research and practice. However, although the vision of individually tailored medicine is alluring, there is a need to distinguish genuine potential from hype. We argue that the goal of personalised medical care faces serious challenges, many of which cannot be addressed through algorithmic complexity, and call for collaboration between traditional methodologists and experts in medical machine learning to avoid extensive research waste

Crossref

PubMed Central

The University of Manchester - Institutional Repository

White Rose Research Online

University of St. Andrews - Pure

St Andrews Research Repository

Big Data Meet ML and AI for Decision Superiority at the Tactical Edge – Algorithm Design, Demonstrate and Concept Model

Author: Boger Dan
Zhao Ying
Publication venue: Monterey, California. Naval Postgraduate School.
Publication date: 01/12/2019
Field of study

NPS NRP Executive SummaryBig Data Meet ML and AI for Decision Superiority at the Tactical Edge – Algorithm Design, Demonstrate and Concept ModelN2/N6 - Information WarfareThis research is supported by funding from the Naval Postgraduate School, Naval Research Program (PE 0605853N/2098). https://nps.edu/nrpChief of Naval Operations (CNO)Approved for public release. Distribution is unlimited.

Calhoun, Institutional Archive of the Naval Postgraduate School