3,710 research outputs found
A new mixture copula model for spatially correlated multiple variables with an environmental application
In environmental monitoring, multiple spatial variables are often sampled at a geographical location that can depend on each other in complex ways, such as non-linear and non-Gaussian spatial dependence. We propose a new mixture copula model that can capture those complex relationships of spatially correlated multiple variables and predict univariate variables while considering the multivariate spatial relationship. The proposed method is demonstrated using an environmental application and compared with three existing methods. Firstly, improvement in the prediction of individual variables by utilising multivariate spatial copula compares to the existing univariate pair copula method. Secondly, performance in prediction by utilising mixture copula in the multivariate spatial copula framework compares with an existing multivariate spatial copula model that uses a non-linear principal component analysis. Lastly, improvement in the prediction of individual variables by utilising the non-linear non-Gaussian multivariate spatial copula model compares to the linear Gaussian multivariate cokriging model. The results show that the proposed spatial mixture copula model outperforms the existing methods in the cross-validation of actual and predicted values at the sampled locations
Longitudinal Data Clustering with a Copula Kernel Mixture Model
Many common clustering methods cannot be used for clustering multivariate
longitudinal data in cases where variables exhibit high autocorrelations. In
this article, a copula kernel mixture model (CKMM) is proposed for clustering
data of this type. The CKMM is a finite mixture model which decomposes each
mixture component's joint density function into its copula and marginal
distribution functions. In this decomposition, the Gaussian copula is used due
to its mathematical tractability and Gaussian kernel functions are used to
estimate the marginal distributions. A generalized expectation-maximization
algorithm is used to estimate the model parameters. The performance of the
proposed model is assessed in a simulation study and on two real datasets. The
proposed model is shown to have effective performance in comparison to standard
methods, such as K-means with dynamic time warping clustering and latent growth
models
Copula models in machine learning
The introduction of copulas, which allow separating the dependence structure of a multivariate distribution from its marginal behaviour, was a major advance in dependence modelling. Copulas brought new theoretical insights to the concept of dependence and enabled the construction of a variety of new multivariate distributions. Despite their popularity in statistics and financial modelling, copulas have remained largely unknown in the machine learning community until recently. This thesis investigates the use of copula models, in particular Gaussian copulas, for solving various machine learning problems and makes contributions in the domains of dependence detection between datasets, compression based on side information, and variable selection.
Our first contribution is the introduction of a copula mixture model to perform dependency-seeking clustering for co-occurring samples from different data sources. The model takes advantage of the great flexibility offered by the copula framework to extend mixtures of Canonical Correlation Analyzers to multivariate data with arbitrary continuous marginal densities. We formulate our model as a non-parametric Bayesian mixture and provide an efficient Markov Chain Monte Carlo inference algorithm for it. Experiments on real and synthetic data demonstrate that the increased flexibility of the copula mixture significantly improves the quality of the clustering and the interpretability of the results.
The second contribution is a reformulation of the information bottleneck (IB) problem in terms of a copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula, we extend the analytical IB solution available for the multivariate Gaussian case to meta-Gaussian distributions which retain a Gaussian dependence structure but allow arbitrary marginal densities. The resulting approach extends the range of applicability of IB to non-Gaussian continuous data and is less sensitive to outliers than the original IB formulation.
Our third and final contribution is the development of a novel sparse compression technique based on the information bottleneck (IB) principle, which takes into account side information. We achieve this by introducing a sparse variant of IB that compresses the data by preserving the information in only a few selected input dimensions. By assuming a Gaussian copula we can capture arbitrary non-Gaussian marginals, continuous or discrete. We use our model to select a subset of biomarkers relevant to the evolution of malignant melanoma and show that our sparse selection provides reliable predictors
Gaussian mixture model for extreme wind turbulence estimation
Uncertainty quantification is necessary in wind turbine design due to the random nature of the environmental inputs, through which the uncertainty of structural loads and response under specific situations can be quantified. Specifically, wind turbulence (described by the standard deviation of the longitudinal wind speed over a 10 min time duration) has a significant impact on the extreme and fatigue design envelope of the wind turbine. The wind parameters (mean and standard deviation of longitudinal wind speed over 10 min time duration) are not independent stochastic variables, and structural reliability analysis or uncertainty quantification therefore requires these wind parameters to be correlated stochastic parameters. An accurate probabilistic model should be established to model the correlation among wind parameters. Compared to univariate distributions, theoretical multivariate distributions are limited and not flexible enough to model the wind parameters from different sites or direction sectors. Copula-based models are often used for correlation description, but existing parametric copulas may not model the correlation among wind parameters well, due to limitations of the copula structures. The Gaussian mixture model is widely applied for density estimation and clustering in many domains, but limited studies have been conducted in wind energy and few have used it for density estimation of wind parameters. In this paper, the Gaussian mixture model is used to model the joint distribution of mean and standard deviation of longitudinal wind speed over 10 min time duration, which is calculated from 15 years of wind measurement time series data. As a comparison, the Nataf transformation (Gaussian copula) and Gumbel copula are compared with the Gaussian mixture model in terms of the estimated marginal distributions and conditional distributions. The Gaussian mixture model is then adopted to estimate the extreme wind turbulence (wind parameters for extreme load), which could be taken as an input to design loads used in the ultimate design limit state of turbine structures. The wind parameter contour associated with a 50-year return period computed from the Gaussian mixture model is compared with what is used in the design of wind turbines as given in IEC 61400-1. The Gaussian mixture model is able to model the joint distribution of wind parameters well, where the estimated tail distributions of both the marginal distributions and conditional distribution have good accuracy, and it is a good candidate for extreme turbulence estimation.</p
Supervised Classification Using Finite Mixture Copula
Use of copula for statistical classification is recent and gaining popularity. For example, statistical classification using copula has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rules assume normality. But in this data age time, this assumption is often questionable. In fact features of data could be a mixture of discrete and continues random variables. In this paper, mixture copula densities are used to model class conditional distributions. Such types of densities are useful when the marginal densities of the vector of features are not normally distributed and are of a mixed kind of variables. Authors have shown that such mixture models are very useful for uncovering hidden structures in the data, and used them for clustering in data mining. Under such mixture models, maximum likelihood estimation methods are not suitable and regular expectation maximization algorithm is inefficient and may not converge. A new estimation method is proposed to estimate such densities and build the classifier based on mixture finite Gaussian densities. Simulations are used to compare the performance of the copula based classifier with classical normal distribution based models, logistic regression based model and independent model cases. The method is also applied to a real data
- …