208 research outputs found

    Variational Bayes Estimation of Discrete-Margined Copula Models with Application to Time Series

    Full text link
    We propose a new variational Bayes estimator for high-dimensional copulas with discrete, or a combination of discrete and continuous, margins. The method is based on a variational approximation to a tractable augmented posterior, and is faster than previous likelihood-based approaches. We use it to estimate drawable vine copulas for univariate and multivariate Markov ordinal and mixed time series. These have dimension rTrT, where TT is the number of observations and rr is the number of series, and are difficult to estimate using previous methods. The vine pair-copulas are carefully selected to allow for heteroskedasticity, which is a feature of most ordinal time series data. When combined with flexible margins, the resulting time series models also allow for other common features of ordinal data, such as zero inflation, multiple modes and under- or over-dispersion. Using six example series, we illustrate both the flexibility of the time series copula models, and the efficacy of the variational Bayes estimator for copulas of up to 792 dimensions and 60 parameters. This far exceeds the size and complexity of copula models for discrete data that can be estimated using previous methods

    Copula-like Variational Inference

    Get PDF
    This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity O(dlogd)\mathcal{O}(d \log d). We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

    Learning Sparse Latent Representations with the Deep Copula Information Bottleneck

    Full text link
    Deep latent variable models are powerful tools for representation learning. In this paper, we adopt the deep information bottleneck model, identify its shortcomings and propose a model that circumvents them. To this end, we apply a copula transformation which, by restoring the invariance properties of the information bottleneck method, leads to disentanglement of the features in the latent space. Building on that, we show how this transformation translates to sparsity of the latent space in the new model. We evaluate our method on artificial and real data.Comment: Published as a conference paper at ICLR 2018. Aleksander Wieczorek and Mario Wieser contributed equally to this wor

    Generative modelling: addressing open problems in model misspecification and differential privacy

    Get PDF
    Generative modelling has become a popular application of artificial intelligence. Model performance can, however, be impacted negatively when the generative model is misspecified, or when the generative model estimator is modified to adhere to a privacy notion such as differential privacy. In this thesis, we approach generative modelling under model misspecification and differential privacy by presenting four different works. We first present related work on generative modelling. Subsequently, we delve into the reasons that necessitate an examination of generative modelling under the challenges of model misspecification and differential privacy. As an initial contribution, we consider generative modelling for density estimation. One way to approach model misspecification is to relax model assumptions. We show that this can also help in nonparametric models. In particular, we study a recently proposed nonparametric quasi-Bayesian density estimator and identify its strong model assumptions as a reason for poor performance in finite data sets. We propose an autoregressive extension relaxing model assumptions to allow for a-priori feature dependencies. Next, we consider generative modelling for missingness imputation. After categorising current deep generative imputation approaches into the classes of nonignorable missingness models as introduced by Rubin [1976], we extend the formulation of variational autoencoders to factorise according to a nonignorable missingness model class that has not been studied in the deep generative modelling literature before. These explicitly model the missingness mechanisms to prevent model misspecification when missingness is not at random. Then, we focus the attention of this thesis on improving synthetic data generation under differential privacy. For this purpose, we propose differentially private importance sampling of differentially private synthetic data samples. We observe that importance sampling helps more, the better the generative model is. We next focus on increasing data generation quality by considering differentially private diffusion models. We identify training strategies that significantly improve the performance of DP image generators. We conclude the dissertation with a discussion, including contributions and limitations of the presented work, and propose potential directions for future work

    Implicit Kernel Attention

    Full text link
    \textit{Attention} computes the dependency between representations, and it encourages the model to focus on the important selective features. Attention-based models, such as Transformers and graph attention networks (GAT) are widely utilized for sequential data and graph-structured data. This paper suggests a new interpretation and generalized structure of the attention in Transformer and GAT. For the attention in Transformer and GAT, we derive that the attention is a product of two parts: 1) the RBF kernel to measure the similarity of two instances and 2) the exponential of L2L^{2} norm to compute the importance of individual instances. From this decomposition, we generalize the attention in three ways. First, we propose implicit kernel attention with an implicit kernel function, instead of manual kernel selection. Second, we generalize L2L^{2} norm as the LpL^{p} norm. Third, we extend our attention to structured multi-head attention. Our generalized attention shows better performance on classification, translation, and regression tasks
    corecore