208 research outputs found
Variational Bayes Estimation of Discrete-Margined Copula Models with Application to Time Series
We propose a new variational Bayes estimator for high-dimensional copulas
with discrete, or a combination of discrete and continuous, margins. The method
is based on a variational approximation to a tractable augmented posterior, and
is faster than previous likelihood-based approaches. We use it to estimate
drawable vine copulas for univariate and multivariate Markov ordinal and mixed
time series. These have dimension , where is the number of observations
and is the number of series, and are difficult to estimate using previous
methods. The vine pair-copulas are carefully selected to allow for
heteroskedasticity, which is a feature of most ordinal time series data. When
combined with flexible margins, the resulting time series models also allow for
other common features of ordinal data, such as zero inflation, multiple modes
and under- or over-dispersion. Using six example series, we illustrate both the
flexibility of the time series copula models, and the efficacy of the
variational Bayes estimator for copulas of up to 792 dimensions and 60
parameters. This far exceeds the size and complexity of copula models for
discrete data that can be estimated using previous methods
Copula-like Variational Inference
This paper considers a new family of variational distributions motivated by
Sklar's theorem. This family is based on new copula-like densities on the
hypercube with non-uniform marginals which can be sampled efficiently, i.e.
with a complexity linear in the dimension of state space. Then, the proposed
variational densities that we suggest can be seen as arising from these
copula-like densities used as base distributions on the hypercube with Gaussian
quantile functions and sparse rotation matrices as normalizing flows. The
latter correspond to a rotation of the marginals with complexity . We provide some empirical evidence that such a variational family can
also approximate non-Gaussian posteriors and can be beneficial compared to
Gaussian approximations. Our method performs largely comparably to
state-of-the-art variational approximations on standard regression and
classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
Learning Sparse Latent Representations with the Deep Copula Information Bottleneck
Deep latent variable models are powerful tools for representation learning.
In this paper, we adopt the deep information bottleneck model, identify its
shortcomings and propose a model that circumvents them. To this end, we apply a
copula transformation which, by restoring the invariance properties of the
information bottleneck method, leads to disentanglement of the features in the
latent space. Building on that, we show how this transformation translates to
sparsity of the latent space in the new model. We evaluate our method on
artificial and real data.Comment: Published as a conference paper at ICLR 2018. Aleksander Wieczorek
and Mario Wieser contributed equally to this wor
Generative modelling: addressing open problems in model misspecification and differential privacy
Generative modelling has become a popular application of artificial intelligence. Model performance can, however, be impacted negatively when the generative model is misspecified, or when the generative model estimator is modified to adhere to a privacy notion such as differential privacy. In this thesis, we approach generative modelling under model misspecification and differential privacy by presenting four different works.
We first present related work on generative modelling. Subsequently, we delve into the reasons that necessitate an examination of generative modelling under the challenges of model misspecification and differential privacy.
As an initial contribution, we consider generative modelling for density estimation. One way to approach model misspecification is to relax model assumptions. We show that this can also help in nonparametric models. In particular, we study a recently proposed nonparametric quasi-Bayesian density estimator and identify its strong model assumptions as a reason for poor performance in finite data sets. We propose an autoregressive extension relaxing model assumptions to allow for a-priori feature dependencies.
Next, we consider generative modelling for missingness imputation. After categorising current deep generative imputation approaches into the classes of nonignorable missingness models as introduced by Rubin [1976], we extend the formulation of variational autoencoders to factorise according to a nonignorable missingness model class that has not been studied in the deep generative modelling literature before. These explicitly model the missingness mechanisms to prevent model misspecification when missingness is not at random.
Then, we focus the attention of this thesis on improving synthetic data generation under differential privacy. For this purpose, we propose differentially private importance sampling of differentially private synthetic data samples. We observe that importance sampling helps more, the better the generative model is. We next focus on increasing data generation quality by considering differentially private diffusion models. We identify training strategies that significantly improve the performance of DP image generators.
We conclude the dissertation with a discussion, including contributions and limitations of the presented work, and propose potential directions for future work
Implicit Kernel Attention
\textit{Attention} computes the dependency between representations, and it
encourages the model to focus on the important selective features.
Attention-based models, such as Transformers and graph attention networks (GAT)
are widely utilized for sequential data and graph-structured data. This paper
suggests a new interpretation and generalized structure of the attention in
Transformer and GAT. For the attention in Transformer and GAT, we derive that
the attention is a product of two parts: 1) the RBF kernel to measure the
similarity of two instances and 2) the exponential of norm to compute
the importance of individual instances. From this decomposition, we generalize
the attention in three ways. First, we propose implicit kernel attention with
an implicit kernel function, instead of manual kernel selection. Second, we
generalize norm as the norm. Third, we extend our attention to
structured multi-head attention. Our generalized attention shows better
performance on classification, translation, and regression tasks
- …