2,463 research outputs found
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Warped products and Spaces of Constant Curvature
We will obtain the warped product decompositions of spaces of constant
curvature (with arbitrary signature) in their natural models as subsets of
pseudo-Euclidean space. This generalizes the corresponding result by S. Nolker
to arbitrary signatures, and has a similar level of detail. Although our
derivation is complete in some sense, none is proven. Motivated by
applications, we will give more information for the spaces with Euclidean and
Lorentzian signatures. This is an expository article which is intended to be
used as a reference. So we also give a review of the theory of circles and
spheres in pseudo-Riemannian manifolds
Fossil fuel based CO2 emissions, economic growth, and world crude oil price nexus in the United States
With the prime objective of learning from the fossil fuel based CO2 emissions-economic growth-world crude price nexus of a leading economy, the underpinning nature of the relationship among them is investigated for the United States (US). Autoregressive distributed lag bounds testing approach to cointegration provides empirical evidence for the existence of a long-run equilibrium relationship with 1% growth in GDP being tied up with 3.2% growth in CO2 emissions in the US. Increase in crude price and technological progress, proxied by time trend, are associated with decline in CO2 emissions in the long-run, though by comparatively small magnitudes. Short-run dynamics restore 25% of any disequilibrium in a year. Owing to the structural breaks identified in the individual series by the unit root tests, the stability of the model coefficients over the sample period is tested using the cumulative sum of recursive residuals test and ascertained. Error-correction based Granger causality tests provide evidence for fluctuating world crude real price Granger causing fluctuations in CO2 emission, and fluctuating CO2 emission Granger causing the rise and fall of real GDP. Deviations from long-run equilibrium are seen to Granger cause changes in both the CO2 emissions and the real GDP in the US.Carbon dioxide emissions; cointegration; crude oil price; forecast; Granger causality; gross domestic product; GDP; United States.
Could Sri Lanka afford sustainable electricity consumption practices without harming her economic growth?
The existence and direction of Granger causality between electricity consumption and economic growth, proxied by gross domestic product (GDP), has been investigated in this study using annual data covering the period 1971 to 2007. The results of the augmented Dickey-Fuller, GLS-detrended Dickey-Fuller and Phillips-Perron tests show that the natural logarithms of both the times series are individually I(1). The autoregressive distributed lag bounds testing approach to cointegration used in this study reveals that the two times series are cointegrated. The estimated long-run equilibrium relationship shows that 1% growth in GDP induces 1.45% growth in electricity consumption, and any deviation from the long-run equilibrium following a short-run disturbance is corrected within 17 months. Granger causality test results reveal uni-directional causality running from economic growth to electricity consumption without any feedback effect. The outcome of such results is beneficial to Sri Lanka’s economic growth since it is not dependent on electricity consumption, and thereby production. It is therefore possible to initiate energy policies towards minimizing wasteful electricity production and consumption practices, without compromising Sri Lanka’s GDP growth, to take her on an electricity-wise sustainable economic development path.ARDL; cointegration; Granger causality; gross domestic product; sustainable electricity consumption; Sri Lanka
Retaining positive definiteness in thresholded matrices
Positive definite (p.d.) matrices arise naturally in many areas within
mathematics and also feature extensively in scientific applications. In modern
high-dimensional applications, a common approach to finding sparse positive
definite matrices is to threshold their small off-diagonal elements. This
thresholding, sometimes referred to as hard-thresholding, sets small elements
to zero. Thresholding has the attractive property that the resulting matrices
are sparse, and are thus easier to interpret and work with. In many
applications, it is often required, and thus implicitly assumed, that
thresholded matrices retain positive definiteness. In this paper we formally
investigate the algebraic properties of p.d. matrices which are thresholded. We
demonstrate that for positive definiteness to be preserved, the pattern of
elements to be set to zero has to necessarily correspond to a graph which is a
union of disconnected complete components. This result rigorously demonstrates
that, except in special cases, positive definiteness can be easily lost. We
then proceed to demonstrate that the class of diagonally dominant matrices is
not maximal in terms of retaining positive definiteness when thresholded.
Consequently, we derive characterizations of matrices which retain positive
definiteness when thresholded with respect to important classes of graphs. In
particular, we demonstrate that retaining positive definiteness upon
thresholding is governed by complex algebraic conditions
Integration and measures on the space of countable labelled graphs
In this paper we develop a rigorous foundation for the study of integration
and measures on the space of all graphs defined on a countable
labelled vertex set . We first study several interrelated -algebras
and a large family of probability measures on graph space. We then focus on a
"dyadic" Hamming distance function , which was
very useful in the study of differentiation on . The function
is shown to be a Haar measure-preserving
bijection from the subset of infinite graphs to the circle (with the
Haar/Lebesgue measure), thereby naturally identifying the two spaces. As a
consequence, we establish a "change of variables" formula that enables the
transfer of the Riemann-Lebesgue theory on to graph space
. This also complements previous work in which a theory of
Newton-Leibnitz differentiation was transferred from the real line to
for countable . Finally, we identify the Pontryagin dual of
, and characterize the positive definite functions on
.Comment: 15 pages, LaTe
The Hoffmann-Jorgensen inequality in metric semigroups
We prove a refinement of the inequality by Hoffmann-Jorgensen that is
significant for three reasons. First, our result improves on the
state-of-the-art even for real-valued random variables. Second, the result
unifies several versions in the Banach space literature, including those by
Johnson and Schechtman [Ann. Probab. 17 (1989)], Klass and Nowicki [Ann.
Probab. 28 (2000)], and Hitczenko and Montgomery-Smith [Ann. Probab. 29
(2001)]. Finally, we show that the Hoffmann-Jorgensen inequality (including our
generalized version) holds not only in Banach spaces but more generally, in a
very primitive mathematical framework required to state the inequality: a
metric semigroup . This includes normed linear spaces as well as
all compact, discrete, or (connected) abelian Lie groups.Comment: 11 pages, published in the Annals of Probability. The Introduction
section shares motivating examples with arXiv:1506.0260
The Khinchin-Kahane and Levy inequalities for abelian metric groups, and transfer from normed (abelian semi)groups to Banach spaces
The Khinchin-Kahane inequality is a fundamental result in the probability
literature, with the most general version to date holding in Banach spaces.
Motivated by modern settings and applications, we generalize this inequality to
arbitrary metric groups which are abelian.
If instead of abelian one assumes the group's metric to be a norm (i.e.,
-homogeneous), then we explain how the inequality improves to
the same one as in Banach spaces. This occurs via a "transfer principle" that
helps carry over questions involving normed metric groups and abelian normed
semigroups into the Banach space framework. This principle also extends the
notion of the expectation to random variables with values in arbitrary abelian
normed metric semigroups . We provide additional applications,
including studying weakly -valued sequences and related
Rademacher series.
On a related note, we also formulate a "general" Levy inequality, with two
features: (i) It subsumes several known variants in the Banach space
literature; and (ii) We show the inequality in the minimal framework required
to state it: abelian metric groups.Comment: 15 pages, Introduction section shares motivating examples with
arXiv:1506.02605. Significant revisions to the exposition. Final version, to
appear in Journal of Mathematical Analysis and Application
- …