154 research outputs found
A Multivariate Skew-Normal-Tukey-h Distribution
We introduce a new family of multivariate distributions by taking the
component-wise Tukey-h transformation of a random vector following a
skew-normal distribution. The proposed distribution is named the
skew-normal-Tukey-h distribution and is an extension of the skew-normal
distribution for handling heavy-tailed data. We compare this proposed
distribution to the skew-t distribution, which is another extension of the
skew-normal distribution for modeling tail-thickness, and demonstrate that when
there are substantial differences in marginal kurtosis, the proposed
distribution is more appropriate. Moreover, we derive many appealing stochastic
properties of the proposed distribution and provide a methodology for the
estimation of the parameters in which the computational requirement increases
linearly with the dimension. Using simulations, as well as a wine and a wind
speed data application, we illustrate how to draw inferences based on the
multivariate skew-normal-Tukey-h distribution
Change Point Detection and Estimation in Sequences of Dependent Random Variables
Two change point detection and estimation procedures for sequences of dependent binary random variables are proposed and their asymptotic properties are explored. The two procedures are a dependent cumulative sum statistic (DCUSUM) and a dependent likelihood ratio test (LRT) statistic, which are generalizations of the independent CUSUM and LRT statistics.
A one step Markov dependence is assumed between consecutive variables in the sequence, and the performance of the DCUSUM and dependent LRT are shown to have substantially better size and power performance than their independent counterparts. In most cases, a comparison of the dependent procedures via simulation shows that the dependent LRT provides a more powerful test, while the DCUSUM test has better size performance.
The asymptotic distribution of the DCUSUM test is found to be a weighted sum of
squared Brownian bridge processes and an approximation to calculate p-values is discussed. A Worsley type upper bound for p-values is provided as an alternative. The asymptotic distribution of the dependent LRT is unknown, but the tail probabilities are found to be empirically bounded by chi-square random variables with 6 and 7 degrees of freedom through a simulation study. A bootstrap algorithm to estimate p-values for the dependent LRT is discussed.
Extensions of these procedures to multiple sequences and multinomial random variables are discussed, and a new statistic, the maximal change count statistic, is proposed. An application of the multiple sequence procedures to clustered time series models is provided. The asymptotic properties of the generalized procedures are reserved for future research
Truncations of Haar distributed matrices, traces and bivariate Brownian bridges
Let U be a Haar distributed unitary matrix in U(n)or O(n). We show that after
centering the double index process converges in distribution to
the bivariate tied-down Brownian bridge. The proof relies on the notion of
second order freeness.Comment: Random matrices: Theory and Applications (RMTA) To appear (2012)
http://www.editorialmanager.com/rmta
Robust and sparse estimation of large precision matrices
The thesis considers the estimation of sparse precision matrices in the highdimensional setting. First, we introduce an integrated approach to estimate undirected graphs and to perform model selection in high-dimensional Gaussian Graphical Models (GGMs). The approach is based on a parametrization of the inverse covariance matrix in terms of the prediction errors of the best linear predictor of each node in the graph. We exploit the relationship between partial correlation
coefficients and the distribution of the prediction errors to propose a novel forward-backward algorithm for detecting pairs of variables having nonzero partial correlations among a large number of random variables based on i.i.d. samples.
Then, we are able to establish asymptotic properties under mild conditions. Finally, numerical studies through simulation and real data examples provide evidence of the practical advantage of the procedure, where the proposed approach outperforms state-of-the-art methods such as the Graphical lasso and CLIME under different
settings.
Furthermore, we study the problem of robust estimation of GGMs in the highdimensional setting when the data may contain outlying observations. We propose a robust precision matrix estimator under the cellwise contamination mechanism that is robust against structural bivariate outliers. This framework exploits robust pairwise
weighted correlation coefficient estimates, where the weights are computed by the Mahalanobis
distance with respect to an affine equivariant robust correlation coefficient estimator. We show that the convergence rate of the proposed estimator is the same as the correlation coefficient used to compute the Mahalanobis distance. We conduct
numerical simulation under different contamination settings to compare the graph
recovery performance of different robust estimators. The proposed method is then
applied to the classiffication of tumors using gene expression data. We show that our
procedure can effectively recover the true graph under cellwise data contamination.Programa Oficial de Doctorado en EconomÃa de la Empresa y Métodos CuantitativosPresidente: José Manuel Mira Mcwilliams; Secretario: Andrés Modesto Alonso Fernández; Vocal: José Ramón Berrendero DÃa
- …