11 research outputs found
The Hierarchy of Stable Distributions and Operators to Trade Off Stability and Performance
Recent work addressing model reliability and generalization has resulted in a
variety of methods that seek to proactively address differences between the
training and unknown target environments. While most methods achieve this by
finding distributions that will be invariant across environments, we will show
they do not necessarily find the same distributions which has implications for
performance. In this paper we unify existing work on prediction using stable
distributions by relating environmental shifts to edges in the graph underlying
a prediction problem, and characterize stable distributions as those which
effectively remove these edges. We then quantify the effect of edge deletion on
performance in the linear case and corroborate the findings in a simulated and
real data experiment
Designing monitoring strategies for deployed machine learning algorithms: navigating performativity through a causal lens
After a machine learning (ML)-based system is deployed, monitoring its
performance is important to ensure the safety and effectiveness of the
algorithm over time. When an ML algorithm interacts with its environment, the
algorithm can affect the data-generating mechanism and be a major source of
bias when evaluating its standalone performance, an issue known as
performativity. Although prior work has shown how to validate models in the
presence of performativity using causal inference techniques, there has been
little work on how to monitor models in the presence of performativity. Unlike
the setting of model validation, there is much less agreement on which
performance metrics to monitor. Different monitoring criteria impact how
interpretable the resulting test statistic is, what assumptions are needed for
identifiability, and the speed of detection. When this choice is further
coupled with the decision to use observational versus interventional data, ML
deployment teams are faced with a multitude of monitoring options. The aim of
this work is to highlight the relatively under-appreciated complexity of
designing a monitoring strategy and how causal reasoning can provide a
systematic framework for choosing between these options. As a motivating
example, we consider an ML-based risk prediction algorithm for predicting
unplanned readmissions. Bringing together tools from causal inference and
statistical process control, we consider six monitoring procedures (three
candidate monitoring criteria and two data sources) and investigate their
operating characteristics in simulation studies. Results from this case study
emphasize the seemingly simple (and obvious) fact that not all monitoring
systems are created equal, which has real-world impacts on the design and
documentation of ML monitoring systems
The Stability and Accuracy Tradeoff Under Dataset Shift: A Causal Graphical Analysis
Recent interest in dataset shift has produced many methods for finding
invariant distributions for prediction in new, unseen environments. However,
these methods consider different types of shifts and have been developed under
disparate frameworks, making it difficult to theoretically analyze how
solutions differ with respect to stability and accuracy. Taking a causal
graphical view, we use a flexible graphical representation to express various
types of dataset shifts. We show that all invariant distributions correspond to
a causal hierarchy of graphical operators which disable the edges in the graph
that are responsible for the shifts. The hierarchy provides a common
theoretical underpinning for understanding when and how stability to shifts can
be achieved, and in what ways stable distributions can differ. We use it to
establish conditions for minimax optimal performance across environments, and
derive new algorithms that find optimal stable distributions. Using this new
perspective, we empirically demonstrate that that there is a tradeoff between
minimax and average performance