2,463 research outputs found

    Foundational principles for large scale inference: Illustrations through correlation mining

    Full text link
    When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number nn of acquired samples (statistical replicates) is far fewer than the number pp of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size nn is fixed, and the dimension pp grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

    Warped products and Spaces of Constant Curvature

    Full text link
    We will obtain the warped product decompositions of spaces of constant curvature (with arbitrary signature) in their natural models as subsets of pseudo-Euclidean space. This generalizes the corresponding result by S. Nolker to arbitrary signatures, and has a similar level of detail. Although our derivation is complete in some sense, none is proven. Motivated by applications, we will give more information for the spaces with Euclidean and Lorentzian signatures. This is an expository article which is intended to be used as a reference. So we also give a review of the theory of circles and spheres in pseudo-Riemannian manifolds

    Fossil fuel based CO2 emissions, economic growth, and world crude oil price nexus in the United States

    Get PDF
    With the prime objective of learning from the fossil fuel based CO2 emissions-economic growth-world crude price nexus of a leading economy, the underpinning nature of the relationship among them is investigated for the United States (US). Autoregressive distributed lag bounds testing approach to cointegration provides empirical evidence for the existence of a long-run equilibrium relationship with 1% growth in GDP being tied up with 3.2% growth in CO2 emissions in the US. Increase in crude price and technological progress, proxied by time trend, are associated with decline in CO2 emissions in the long-run, though by comparatively small magnitudes. Short-run dynamics restore 25% of any disequilibrium in a year. Owing to the structural breaks identified in the individual series by the unit root tests, the stability of the model coefficients over the sample period is tested using the cumulative sum of recursive residuals test and ascertained. Error-correction based Granger causality tests provide evidence for fluctuating world crude real price Granger causing fluctuations in CO2 emission, and fluctuating CO2 emission Granger causing the rise and fall of real GDP. Deviations from long-run equilibrium are seen to Granger cause changes in both the CO2 emissions and the real GDP in the US.Carbon dioxide emissions; cointegration; crude oil price; forecast; Granger causality; gross domestic product; GDP; United States.

    Could Sri Lanka afford sustainable electricity consumption practices without harming her economic growth?

    Get PDF
    The existence and direction of Granger causality between electricity consumption and economic growth, proxied by gross domestic product (GDP), has been investigated in this study using annual data covering the period 1971 to 2007. The results of the augmented Dickey-Fuller, GLS-detrended Dickey-Fuller and Phillips-Perron tests show that the natural logarithms of both the times series are individually I(1). The autoregressive distributed lag bounds testing approach to cointegration used in this study reveals that the two times series are cointegrated. The estimated long-run equilibrium relationship shows that 1% growth in GDP induces 1.45% growth in electricity consumption, and any deviation from the long-run equilibrium following a short-run disturbance is corrected within 17 months. Granger causality test results reveal uni-directional causality running from economic growth to electricity consumption without any feedback effect. The outcome of such results is beneficial to Sri Lanka’s economic growth since it is not dependent on electricity consumption, and thereby production. It is therefore possible to initiate energy policies towards minimizing wasteful electricity production and consumption practices, without compromising Sri Lanka’s GDP growth, to take her on an electricity-wise sustainable economic development path.ARDL; cointegration; Granger causality; gross domestic product; sustainable electricity consumption; Sri Lanka

    Retaining positive definiteness in thresholded matrices

    Get PDF
    Positive definite (p.d.) matrices arise naturally in many areas within mathematics and also feature extensively in scientific applications. In modern high-dimensional applications, a common approach to finding sparse positive definite matrices is to threshold their small off-diagonal elements. This thresholding, sometimes referred to as hard-thresholding, sets small elements to zero. Thresholding has the attractive property that the resulting matrices are sparse, and are thus easier to interpret and work with. In many applications, it is often required, and thus implicitly assumed, that thresholded matrices retain positive definiteness. In this paper we formally investigate the algebraic properties of p.d. matrices which are thresholded. We demonstrate that for positive definiteness to be preserved, the pattern of elements to be set to zero has to necessarily correspond to a graph which is a union of disconnected complete components. This result rigorously demonstrates that, except in special cases, positive definiteness can be easily lost. We then proceed to demonstrate that the class of diagonally dominant matrices is not maximal in terms of retaining positive definiteness when thresholded. Consequently, we derive characterizations of matrices which retain positive definiteness when thresholded with respect to important classes of graphs. In particular, we demonstrate that retaining positive definiteness upon thresholding is governed by complex algebraic conditions

    Integration and measures on the space of countable labelled graphs

    Full text link
    In this paper we develop a rigorous foundation for the study of integration and measures on the space G(V)\mathscr{G}(V) of all graphs defined on a countable labelled vertex set VV. We first study several interrelated σ\sigma-algebras and a large family of probability measures on graph space. We then focus on a "dyadic" Hamming distance function ψ,2\left\| \cdot \right\|_{\psi,2}, which was very useful in the study of differentiation on G(V)\mathscr{G}(V). The function ψ,2\left\| \cdot \right\|_{\psi,2} is shown to be a Haar measure-preserving bijection from the subset of infinite graphs to the circle (with the Haar/Lebesgue measure), thereby naturally identifying the two spaces. As a consequence, we establish a "change of variables" formula that enables the transfer of the Riemann-Lebesgue theory on R\mathbb{R} to graph space G(V)\mathscr{G}(V). This also complements previous work in which a theory of Newton-Leibnitz differentiation was transferred from the real line to G(V)\mathscr{G}(V) for countable VV. Finally, we identify the Pontryagin dual of G(V)\mathscr{G}(V), and characterize the positive definite functions on G(V)\mathscr{G}(V).Comment: 15 pages, LaTe

    The Hoffmann-Jorgensen inequality in metric semigroups

    Full text link
    We prove a refinement of the inequality by Hoffmann-Jorgensen that is significant for three reasons. First, our result improves on the state-of-the-art even for real-valued random variables. Second, the result unifies several versions in the Banach space literature, including those by Johnson and Schechtman [Ann. Probab. 17 (1989)], Klass and Nowicki [Ann. Probab. 28 (2000)], and Hitczenko and Montgomery-Smith [Ann. Probab. 29 (2001)]. Finally, we show that the Hoffmann-Jorgensen inequality (including our generalized version) holds not only in Banach spaces but more generally, in a very primitive mathematical framework required to state the inequality: a metric semigroup G\mathscr{G}. This includes normed linear spaces as well as all compact, discrete, or (connected) abelian Lie groups.Comment: 11 pages, published in the Annals of Probability. The Introduction section shares motivating examples with arXiv:1506.0260

    The Khinchin-Kahane and Levy inequalities for abelian metric groups, and transfer from normed (abelian semi)groups to Banach spaces

    Full text link
    The Khinchin-Kahane inequality is a fundamental result in the probability literature, with the most general version to date holding in Banach spaces. Motivated by modern settings and applications, we generalize this inequality to arbitrary metric groups which are abelian. If instead of abelian one assumes the group's metric to be a norm (i.e., Z>0\mathbb{Z}_{>0}-homogeneous), then we explain how the inequality improves to the same one as in Banach spaces. This occurs via a "transfer principle" that helps carry over questions involving normed metric groups and abelian normed semigroups into the Banach space framework. This principle also extends the notion of the expectation to random variables with values in arbitrary abelian normed metric semigroups G\mathscr{G}. We provide additional applications, including studying weakly p\ell_p G\mathscr{G}-valued sequences and related Rademacher series. On a related note, we also formulate a "general" Levy inequality, with two features: (i) It subsumes several known variants in the Banach space literature; and (ii) We show the inequality in the minimal framework required to state it: abelian metric groups.Comment: 15 pages, Introduction section shares motivating examples with arXiv:1506.02605. Significant revisions to the exposition. Final version, to appear in Journal of Mathematical Analysis and Application
    corecore