Search CORE

55 research outputs found

Robust Predictive Inference for Multivariate Linear Models with Elliptically Contoured Distribution Using Bayesian, Classical and Structural Approaches

Author: Kibria B. M. Golam
Publication venue: DigitalCommons@WayneState
Publication date: 01/11/2008
Field of study

Predictive distributions of future response and future regression matrices under multivariate elliptically contoured distributions are discussed. Under the elliptically contoured response assumptions, these are identical to those obtained under matric normal or matric-t errors using structural, Bayesian with improper prior, or classical approaches. This gives inference robustness with respect to departure from the reference case of independent sampling from the matric normal or matric t to multivariate elliptically contoured distributions. The importance of the predictive distribution for skewed elliptical models is indicated; the elliptically contoured distribution, as well as matric t distribution, have significant applications in statistical practices

Digital Commons@Wayne State University

High Dimensional Correlation Networks And Their Applications.

Author: Firouzi Hamed
Publication venue
Publication date: 01/01/2015
Field of study

Analysis of interactions between variables in a large data set has recently attracted special attention in the context of high dimensional multivariate statistical analysis. Variable interactions play a role in many inference tasks, such as, classification, clustering, estimation, and prediction. This thesis focuses on the discovery of correlation and partial correlation structures as well as their applications in high dimensional data analysis and inference. The thesis considers problems of screening correlation and partial correlation networks by thresholding the sample correlation or the sample partial correlation matrix. The selection of the threshold is guided by our high dimensional asymptotic theory for screening such networks. Scalable methods of edge and hub screening are developed for applications in spatio-temporal analysis of time series, variable selection for linear prediction, and support recovery. The proposed methods are specifically designed for very high dimensional data with limited number of samples. Moreover, the correlation screening theory developed in this thesis provides high dimensional family-wise error rates on false discoveries.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113492/1/firouzi_1.pd

Deep Blue Documents at the University of Michigan

A New Generation of Mixture-Model Cluster Analysis with Information Complexity and the Genetic EM Algorithm

Author: Howe John Andrew
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2009
Field of study

In this dissertation, we extend several relatively new developments in statistical model selection and data mining in order to improve one of the workhorse statistical tools - mixture modeling (Pearson, 1894). The traditional mixture model assumes data comes from several populations of Gaussian distributions. Thus, what remains is to determine how many distributions, their population parameters, and the mixing proportions. However, real data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or nonnormal tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we develop the mixture model under a broader distributional assumption by fitting a group of multivariate elliptically-contoured distributions (Anderson and Fang, 1990; Fang et al., 1990). Special cases include the multivariate Gaussian and power exponential distributions, as well as the multivariate generalization of the Student’s T. This gives us the flexibility to model nonnormal tail and peak behavior, though the symmetry restriction still exists. The literature has many examples of research generalizing the Gaussian mixture model to other distributions (Farrell and Mersereau, 2004; Hasselblad, 1966; John, 1970a), but our effort is more general. Further, we generalize the mixture model to be non-parametric, by developing two types of kernel mixture model. First, we generalize the mixture model to use the truly multivariate kernel density estimators (Wand and Jones, 1995). Additionally, we develop the power exponential product kernel mixture model, which allows the density to adjust to the shape of each dimension independently. Because kernel density estimators enforce no functional form, both of these methods can adapt to nonnormal asymmetric, kurtotic, and tail characteristics. Over the past two decades or so, evolutionary algorithms have grown in popularity, as they have provided encouraging results in a variety of optimization problems. Several authors have applied the genetic algorithm - a subset of evolutionary algorithms - to mixture modeling, including Bhuyan et al. (1991), Krishna and Murty (1999), and Wicker (2006). These procedures have the benefit that they bypass computational issues that plague the traditional methods. We extend these initialization and optimization methods by combining them with our updated mixture models. Additionally, we “borrow” results from robust estimation theory (Ledoit and Wolf, 2003; Shurygin, 1983; Thomaz, 2004) in order to data-adaptively regularize population covariance matrices. Numerical instability of the covariance matrix can be a significant problem for mixture modeling, since estimation is typically done on a relatively small subset of the observations. We likewise extend various information criteria (Akaike, 1973; Bozdogan, 1994b; Schwarz, 1978) to the elliptically-contoured and kernel mixture models. Information criteria guide model selection and estimation based on various approximations to the Kullback-Liebler divergence. Following Bozdogan (1994a), we use these tools to sequentially select the best mixture model, select the best subset of variables, and detect influential observations - all without making any subjective decisions. Over the course of this research, we developed a full-featured Matlab toolbox (M3) which implements all the new developments in mixture modeling presented in this dissertation. We show results on both simulated and real world datasets. Keywords: mixture modeling, nonparametric estimation, subset selection, influence detection, evidence-based medical diagnostics, unsupervised classification, robust estimation

University of Tennessee, Knoxville: Trace

Exact Dimensionality Selection for Bayesian PCA

Author: Bouveyron Charles
Latouche Pierre
Mattei Pierre-Alexandre
Publication venue
Publication date: 21/05/2019
Field of study

We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Probabilistic sequential matrix factorization

Author: Akyildiz Ömer Deniz
Damoulas Theodoros
Steel Mark F. J.
van den Burg Gerrit J. J.
Publication venue: PMLR
Publication date: 01/01/2021
Field of study

We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with potentially nonlinear Markovian dependencies. The assumed Markovian structure on the coefficients enables us to encode temporal dependencies into a low-dimensional feature space. The proposed inference method is solely based on an approximate extended Kalman filtering scheme, which makes the resulting method particularly efficient. PSMF can account for temporal nonlinearities and, more importantly, can be used to calibrate and estimate generic differentiable nonlinear subspace models. We also introduce a robust version of PSMF, called rPSMF, which uses Student-t filters to handle model misspecification. We show that PSMF can be used in multiple contexts: modeling time series with a periodic subspace, robustifying changepoint detection methods, and imputing missing data in several high-dimensional time-series, such as measurements of pollutants across London

Warwick Research Archives Portal Repository

Vol. 7, No. 2 (Full Issue)

Author: Editors JMASM
Publication venue: DigitalCommons@WayneState
Publication date: 01/11/2008
Field of study

Digital Commons@Wayne State University