Addressing data scarcity in autonomous systems through trustworthy counterfactual generation

Abstract

Autonomous systems often operate in environments where collecting large, diverse, and safety-critical datasets is difficult. This data scarcity limits their reliability, particularly in rare or hazardous scenarios that are hard to capture in the real world. This thesis addresses data scarcity by integrating structural causal models with diffusionbased generative models to produce trustworthy, high-fidelity counterfactual images for “what-if’’ reasoning. Thus, two frameworks are proposed: Causal DiffuseVAE and Causal DiffuseLLM. Both generate images that follow a directed acyclic graph of semantic factors while preserving visual realism. The thesis first outlines key concepts in causal generative modeling and modern deep generative methods, highlighting that existing approaches either provide interpretable causal control with limited fidelity or achieve photorealism without reliable intervention behavior. Causal DiffuseVAE structures the latent space using a causal graph and applies a diffusion decoder for detail reconstruction. Experiments show a 40% reduction in generation time and a 30% improvement in counterfactual accuracy compared with state-of-the art causal diffusion models. Causal DiffuseLLM, which maps language instructions to causal interventions, improves generation accuracy by 15% over its non-LLM baseline and localizes edits to causally affected regions. Overall, this thesis shows that embedding causal reasoning into diffusion pipelines provides a practical path to generating reliable data for autonomous systems operating under limited data conditions

Similar works

This paper was published in Glasgow Theses Service.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.