16,724 research outputs found
A Primer on Causality in Data Science
Many questions in Data Science are fundamentally causal in that our objective
is to learn the effect of some exposure, randomized or not, on an outcome
interest. Even studies that are seemingly non-causal, such as those with the
goal of prediction or prevalence estimation, have causal elements, including
differential censoring or measurement. As a result, we, as Data Scientists,
need to consider the underlying causal mechanisms that gave rise to the data,
rather than simply the pattern or association observed in those data. In this
work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to
provide an introduction to some key concepts in causal inference. Similar to
other causal frameworks, the steps of the Roadmap include clearly stating the
scientific question, defining of the causal model, translating the scientific
question into a causal parameter, assessing the assumptions needed to express
the causal parameter as a statistical estimand, implementation of statistical
estimators including parametric and semi-parametric methods, and interpretation
of our findings. We believe that using such a framework in Data Science will
help to ensure that our statistical analyses are guided by the scientific
question driving our research, while avoiding over-interpreting our results. We
focus on the effect of an exposure occurring at a single time point and
highlight the use of targeted maximum likelihood estimation (TMLE) with Super
Learner.Comment: 26 pages (with references); 4 figure
Causality-Based Feature Importance Quantifying Methods: PN-FI, PS-FI and PNS-FI
In the current ML field models are getting larger and more complex, and data
used for model training are also getting larger in quantity and higher in
dimensions. Therefore, in order to train better models, and save training time
and computational resources, a good Feature Selection (FS) method in the
preprocessing stage is necessary. Feature importance (FI) is of great
importance since it is the basis of feature selection. Therefore, this paper
creatively introduces the calculation of PN (the probability of Necessity), PN
(the probability of Sufficiency), and PNS (the probability of Necessity and
Sufficiency) of Causality into quantifying feature importance and creates 3 new
FI measuring methods, PN-FI, which means how much importance a feature has in
image recognition tasks, PS-FI that means how much importance a feature has in
image generating tasks, and PNS-FI which measures both. The main body of this
paper is three RCTs, with whose results we show how PS-FI, PN-FI, and PNS-FI of
3 features, dog nose, dog eyes, and dog mouth are calculated. The experiments
show that firstly, FI values are intervals with tight upper and lower bounds.
Secondly, the feature dog eyes has the most importance while the other two have
almost the same. Thirdly, the bounds of PNS and PN are tighter than the bounds
of PS.Comment: 7 page
CausalOps -- Towards an Industrial Lifecycle for Causal Probabilistic Graphical Models
Causal probabilistic graph-based models have gained widespread utility,
enabling the modeling of cause-and-effect relationships across diverse domains.
With their rising adoption in new areas, such as automotive system safety and
machine learning, the need for an integrated lifecycle framework akin to DevOps
and MLOps has emerged. Currently, a process reference for organizations
interested in employing causal engineering is missing. To address this gap and
foster widespread industrial adoption, we propose CausalOps, a novel lifecycle
framework for causal model development and application. By defining key
entities, dependencies, and intermediate artifacts generated during causal
engineering, we establish a consistent vocabulary and workflow model. This work
contextualizes causal model usage across different stages and stakeholders,
outlining a holistic view of creating and maintaining them. CausalOps' aim is
to drive the adoption of causal methods in practical applications within
interested organizations and the causality community
Time to reality check the promises of machine learning-powered precision medicine
Machine learning methods, combined with large electronic health databases, could enable a personalised approach to medicine through improved diagnosis and prediction of individual responses to therapies. If successful, this strategy would represent a revolution in clinical research and practice. However, although the vision of individually tailored medicine is alluring, there is a need to distinguish genuine potential from hype. We argue that the goal of personalised medical care faces serious challenges, many of which cannot be addressed through algorithmic complexity, and call for collaboration between traditional methodologists and experts in medical machine learning to avoid extensive research waste
Big Data Meet ML and AI for Decision Superiority at the Tactical Edge – Algorithm Design, Demonstrate and Concept Model
NPS NRP Executive SummaryBig Data Meet ML and AI for Decision Superiority at the Tactical Edge – Algorithm Design, Demonstrate and Concept ModelN2/N6 - Information WarfareThis research is supported by funding from the Naval Postgraduate School, Naval Research Program (PE 0605853N/2098). https://nps.edu/nrpChief of Naval Operations (CNO)Approved for public release. Distribution is unlimited.
- …