The sharing of microdata, such as fund holdings and derivative instruments,
by regulatory institutions presents a unique challenge due to strict data
confidentiality and privacy regulations. These challenges often hinder the
ability of both academics and practitioners to conduct collaborative research
effectively. The emergence of generative models, particularly diffusion models,
capable of synthesizing data mimicking the underlying distributions of
real-world data presents a compelling solution. This work introduces 'FinDiff',
a diffusion model designed to generate real-world financial tabular data for a
variety of regulatory downstream tasks, for example economic scenario modeling,
stress tests, and fraud detection. The model uses embedding encodings to model
mixed modality financial data, comprising both categorical and numeric
attributes. The performance of FinDiff in generating synthetic tabular
financial data is evaluated against state-of-the-art baseline models using
three real-world financial datasets (including two publicly available datasets
and one proprietary dataset). Empirical results demonstrate that FinDiff excels
in generating synthetic tabular financial data with high fidelity, privacy, and
utility.Comment: 9 pages, 5 figures, 3 tables, preprint version, currently under
revie