4 research outputs found
Counterfactual (Non-)identifiability of Learned Structural Causal Models
Recent advances in probabilistic generative modeling have motivated learning
Structural Causal Models (SCM) from observational datasets using deep
conditional generative models, also known as Deep Structural Causal Models
(DSCM). If successful, DSCMs can be utilized for causal estimation tasks, e.g.,
for answering counterfactual queries. In this work, we warn practitioners about
non-identifiability of counterfactual inference from observational data, even
in the absence of unobserved confounding and assuming known causal structure.
We prove counterfactual identifiability of monotonic generation mechanisms with
single dimensional exogenous variables. For general generation mechanisms with
multi-dimensional exogenous variables, we provide an impossibility result for
counterfactual identifiability, motivating the need for parametric assumptions.
As a practical approach, we propose a method for estimating worst-case errors
of learned DSCMs' counterfactual predictions. The size of this error can be an
essential metric for deciding whether or not DSCMs are a viable approach for
counterfactual inference in a specific problem setting. In evaluation, our
method confirms negligible counterfactual errors for an identifiable SCM from
prior work, and also provides informative error bounds on counterfactual errors
for a non-identifiable synthetic SCM
CausalSim: Toward A Causal Data-Driven Simulator For Network Protocols
Evaluating the real-world performance of network protocols is challenging. Randomized control trials (RCT) are expensive and inaccessible to most researchers, while expert-designed simulators fail to capture complex behaviors in real networks. We present CausalSim, a data-driven simulator for network protocols that addresses this challenge. Learning network behavior from observational data is complicated due to the bias introduced by the protocols used during data collection. CausalSim uses traces from an initial RCT under a set of protocols to learn a causal network model, effectively removing the biases present in the data. Using this model, CausalSim can then simulate any protocol over the same traces (i.e., for counterfactual predictions). Key to CausalSim is the novel use of adversarial neural network training that exploits distributional invariances that are present due to the training data coming from an RCT. Our extensive evaluation of CausalSim on both real and synthetic datasets and two use cases, including more than nine months of real data from the Puffer video streaming system, shows that it provides accurate counterfactual predictions, reducing prediction error by 44% and 53% on average compared to expert-designed and standard supervised learning baselines.S.M
CausalSim: A Causal Inference Framework for Unbiased Trace-Driven Simulation
We present CausalSim, a causal inference framework for unbiased trace-driven
simulation. Current trace-driven simulators assume that the interventions being
simulated (e.g., a new algorithm) would not affect the validity of the traces.
However, real-world traces are often biased by the choices of algorithms made
during trace collection, and hence replaying traces under an intervention may
lead to incorrect results. CausalSim addresses this challenge by learning a
causal model of the system dynamics and latent factors capturing the underlying
system conditions during trace collection. It learns these models using an
initial randomized control trial (RCT) under a fixed set of algorithms, and
then applies them to remove biases from trace data when simulating new
algorithms.
Key to CausalSim is mapping unbiased trace-driven simulation to a tensor
completion problem with extremely sparse observations. By exploiting a basic
distributional invariance property present in RCT data, CausalSim enables a
novel tensor completion method despite the sparsity of observations. Our
extensive evaluation of CausalSim on both real and synthetic datasets,
including more than ten months of real data from the Puffer video streaming
system show it improves simulation accuracy, reducing errors by 53% and 61% on
average compared to expert-designed and supervised learning baselines.
Moreover, CausalSim provides markedly different insights about ABR algorithms
compared to the biased baseline simulator, which we validate with a real
deployment