210,610 research outputs found

    Operator norm consistent estimation of large-dimensional sparse covariance matrices

    Full text link
    Estimating covariance matrices is a problem of fundamental importance in multivariate statistics. In practice it is increasingly frequent to work with data matrices XX of dimension n×pn\times p, where pp and nn are both large. Results from random matrix theory show very clearly that in this setting, standard estimators like the sample covariance matrix perform in general very poorly. In this "large nn, large pp" setting, it is sometimes the case that practitioners are willing to assume that many elements of the population covariance matrix are equal to 0, and hence this matrix is sparse. We develop an estimator to handle this situation. The estimator is shown to be consistent in operator norm, when, for instance, we have pnp\asymp n as nn\to\infty. In other words the largest singular value of the difference between the estimator and the population covariance matrix goes to zero. This implies consistency of all the eigenvalues and consistency of eigenspaces associated to isolated eigenvalues. We also propose a notion of sparsity for matrices, that is, "compatible" with spectral analysis and is independent of the ordering of the variables.Comment: Published in at http://dx.doi.org/10.1214/07-AOS559 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating Dynamic Traffic Matrices by using Viable Routing Changes

    Get PDF
    Abstract: In this paper we propose a new approach for dealing with the ill-posed nature of traffic matrix estimation. We present three solution enhancers: an algorithm for deliberately changing link weights to obtain additional information that can make the underlying linear system full rank; a cyclo-stationary model to capture both long-term and short-term traffic variability, and a method for estimating the variance of origin-destination (OD) flows. We show how these three elements can be combined into a comprehensive traffic matrix estimation procedure that dramatically reduces the errors compared to existing methods. We demonstrate that our variance estimates can be used to identify the elephant OD flows, and we thus propose a variant of our algorithm that addresses the problem of estimating only the heavy flows in a traffic matrix. One of our key findings is that by focusing only on heavy flows, we can simplify the measurement and estimation procedure so as to render it more practical. Although there is a tradeoff between practicality and accuracy, we find that increasing the rank is so helpful that we can nevertheless keep the average errors consistently below the 10% carrier target error rate. We validate the effectiveness of our methodology and the intuition behind it using commercial traffic matrix data from Sprint's Tier-1 backbon

    Quantum Chi-Squared and Goodness of Fit Testing

    Get PDF
    The density matrix in quantum mechanics parameterizes the statistical properties of the system under observation, just like a classical probability distribution does for classical systems. The expectation value of observables cannot be measured directly, it can only be approximated by applying classical statistical methods to the frequencies by which certain measurement outcomes (clicks) are obtained. In this paper, we make a detailed study of the statistical fluctuations obtained during an experiment in which a hypothesis is tested, i.e. the hypothesis that a certain setup produces a given quantum state. Although the classical and quantum problem are very much related to each other, the quantum problem is much richer due to the additional optimization over the measurement basis. Just as in the case of classical hypothesis testing, the confidence in quantum hypothesis testing scales exponentially in the number of copies. In this paper, we will argue 1) that the physically relevant data of quantum experiments is only contained in the frequencies of the measurement outcomes, and that the statistical fluctuations of the experiment are essential, so that the correct formulation of the conclusions of a quantum experiment should be given in terms of hypothesis tests, 2) that the (classical) χ2\chi^2 test for distinguishing two quantum states gives rise to the quantum χ2\chi^2 divergence when optimized over the measurement basis, 3) present a max-min characterization for the optimal measurement basis for quantum goodness of fit testing, find the quantum measurement which leads both to the maximal Pitman and Bahadur efficiency, and determine the associated divergence rates.Comment: 22 Pages, with a new section on parameter estimatio

    Traffic matrix estimation on a large IP backbone: a comparison on real data

    Get PDF
    This paper considers the problem of estimating the point-to-point traffic matrix in an operational IP backbone. Contrary to previous studies, that have used a partial traffic matrix or demands estimated from aggregated Netflow traces, we use a unique data set of complete traffic matrices from a global IP network measured over five-minute intervals. This allows us to do an accurate data analysis on the time-scale of typical link-load measurements and enables us to make a balanced evaluation of different traffic matrix estimation techniques. We describe the data collection infrastructure, present spatial and temporal demand distributions, investigate the stability of fan-out factors, and analyze the mean-variance relationships between demands. We perform a critical evaluation of existing and novel methods for traffic matrix estimation, including recursive fanout estimation, worst-case bounds, regularized estimation techniques, and methods that rely on mean-variance relationships. We discuss the weaknesses and strengths of the various methods, and highlight differences in the results for the European and American subnetworks

    Restricted Covariance Priors with Applications in Spatial Statistics

    Get PDF
    We present a Bayesian model for area-level count data that uses Gaussian random effects with a novel type of G-Wishart prior on the inverse variance--covariance matrix. Specifically, we introduce a new distribution called the truncated G-Wishart distribution that has support over precision matrices that lead to positive associations between the random effects of neighboring regions while preserving conditional independence of non-neighboring regions. We describe Markov chain Monte Carlo sampling algorithms for the truncated G-Wishart prior in a disease mapping context and compare our results to Bayesian hierarchical models based on intrinsic autoregression priors. A simulation study illustrates that using the truncated G-Wishart prior improves over the intrinsic autoregressive priors when there are discontinuities in the disease risk surface. The new model is applied to an analysis of cancer incidence data in Washington State.Comment: Published at http://dx.doi.org/10.1214/14-BA927 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/

    Statistical eigen-inference from large Wishart matrices

    Full text link
    We consider settings where the observations are drawn from a zero-mean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on inferential procedures that are based on the sample eigenvalues alone (i.e., "eigen-inference"). Results found in the literature establish the asymptotic normality of the fluctuation in the trace of powers of the sample covariance matrix. We develop concrete algorithms for analytically computing the limiting quantities and the covariance of the fluctuations. We exploit the asymptotic normality of the trace of powers of the sample covariance matrix to develop eigenvalue-based procedures for testing and estimation. Specifically, we formulate a simple test of hypotheses for the population eigenvalues and a technique for estimating the population eigenvalues in settings where the cumulative distribution function of the (nonrandom) population eigenvalues has a staircase structure. Monte Carlo simulations are used to demonstrate the superiority of the proposed methodologies over classical techniques and the robustness of the proposed techniques in high-dimensional, (relatively) small sample size settings. The improved performance results from the fact that the proposed inference procedures are "global" (in a sense that we describe) and exploit "global" information thereby overcoming the inherent biases that cripple classical inference procedures which are "local" and rely on "local" information.Comment: Published in at http://dx.doi.org/10.1214/07-AOS583 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A heuristics approach for computing the largest eigenvalue of a pairwise comparison matrix

    Get PDF
    Pairwise comparison matrices (PCMs) are widely used to capture subjective human judgements, especially in the context of the Analytic Hierarchy Process (AHP). Consistency of judgements is normally computed in AHP context in the form of consistency ratio (CR), which requires estimation of the largest eigenvalue (Lmax) of PCMs. Since many of these alternative methods do not require calculation of eigenvector, Lmax and hence the CR of a PCM cannot be easily estimated. We propose in this paper a simple heuristics for calculating Lmax without any need to use Eigenvector Method (EM). We illustrated the proposed procedure with larger size matrices. Simulation is used to compare the accuracy of the proposed heuristics procedure with actual Lmax for PCMs of various sizes. It has been found that the proposed heuristics is highly accurate, with errors less than 1%. The proposed procedure would avoid biases and help managers to make better decisions. The advantage of the proposed heuristics is that it can be easily calculated with simple calculations without any need for specialised mathematical procedures or software and is independent of the method used to derive priorities from PCMs
    corecore