156 research outputs found
Empirical geodesic graphs and CAT(k) metrics for data analysis
A methodology is developed for data analysis based on empirically constructed
geodesic metric spaces. For a probability distribution, the length along a path
between two points can be defined as the amount of probability mass accumulated
along the path. The geodesic, then, is the shortest such path and defines a
geodesic metric. Such metrics are transformed in a number of ways to produce
parametrised families of geodesic metric spaces, empirical versions of which
allow computation of intrinsic means and associated measures of dispersion.
These reveal properties of the data, based on geometry, such as those that are
difficult to see from the raw Euclidean distances. Examples of application
include clustering and classification. For certain parameter ranges, the spaces
become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal
spanning tree of a graph based on the data becomes CAT(0). In another, a
so-called "metric cone" construction allows extension to CAT() spaces. It is
shown how to empirically tune the parameters of the metrics, making it possible
to apply them to a number of real cases.Comment: Statistics and Computing, 201
Approximation of probability density functions for PDEs with random parameters using truncated series expansions
The probability density function (PDF) of a random variable associated with
the solution of a partial differential equation (PDE) with random parameters is
approximated using a truncated series expansion. The random PDE is solved using
two stochastic finite element methods, Monte Carlo sampling and the stochastic
Galerkin method with global polynomials. The random variable is a functional of
the solution of the random PDE, such as the average over the physical domain.
The truncated series are obtained considering a finite number of terms in the
Gram-Charlier or Edgeworth series expansions. These expansions approximate the
PDF of a random variable in terms of another PDF, and involve coefficients that
are functions of the known cumulants of the random variable. To the best of our
knowledge, their use in the framework of PDEs with random parameters has not
yet been explored
"Building" exact confidence nets
Confidence nets, that is, collections of confidence intervals that fill out
the parameter space and whose exact parameter coverage can be computed, are
familiar in nonparametric statistics. Here, the distributional assumptions are
based on invariance under the action of a finite reflection group. Exact
confidence nets are exhibited for a single parameter, based on the root system
of the group. The main result is a formula for the generating function of the
coverage interval probabilities. The proof makes use of the theory of
"buildings" and the Chevalley factorization theorem for the length distribution
on Cayley graphs of finite reflection groups.Comment: 20 pages. To appear in Bernoull
The algebraic method in quadrature for uncertainty quantification
A general method of quadrature for uncertainty quantification (UQ) is introduced based on the algebraic method in experimental design. This is a method based on the theory of zero-dimensional algebraic varieties. It allows quadrature of polynomials or polynomial approximands for quite general sets of quadrature points, here called âdesigns.â The method goes some way to explaining when quadrature weights are nonnegative and gives exact quadrature for monomials in the quotient ring defined by the algebraic method. The relationship to the classical methods based on zeros of orthogonal polynomials is discussed, and numerical comparisons are made with methods such as Gaussian quadrature and Smolyak grids. Application to UQ is examined in the context of polynomial chaos expansion and the probabilistic collocation method, where solution statistics are estimated
(U,V)-Ordering and a Duality Theorem for Risk Aversion and Lorenz-type Orderings
There is a duality theory connecting certain stochastic orderings between
cumulative distribution functions F_1,F_2 and stochastic orderings between
their inverses F_1^(-1),F_2^(-1). This underlies some theories of utility in
the case of the cdf and deprivation indices in the case of the inverse. Under
certain conditions there is an equivalence between the two theories. An example
is the equivalence between second order stochastic dominance and the Lorenz
ordering. This duality is generalised to include the case where there is
"distortion" of the cdf of the form v(F) and also of the inverse. A
comprehensive duality theorem is presented in a form which includes the
distortions and links the duality to the parallel theories of risk and
deprivation indices. It is shown that some well-known examples are special
cases of the results, including some from the Yaari social welfare theory and
the theory of majorization.Comment: 23 pages, no figures, 2 Appendice
Bregman divergences based on optimal design criteria and simplicial measures of dispersion
In previous work the authors defined the k-th order simplicial distance between probability distributions which arises naturally from a measure of dispersion based on the squared volume of random simplices of dimension k. This theory is embedded in the wider theory of divergences and distances between distributions which includes KullbackâLeibler, JensenâShannon, JeffreysâBregman divergence and Bhattacharyya distance. A general construction is given based on defining a directional derivative of a function Ï from one distribution to the other whose concavity or strict concavity influences the properties of the resulting divergence. For the normal distribution these divergences can be expressed as matrix formula for the (multivariate) means and covariances. Optimal experimental design criteria contribute a range of functionals applied to non-negative, or positive definite, information matrices. Not all can distinguish normal distributions but sufficient conditions are given. The k-th order simplicial distance is revisited from this aspect and the results are used to test empirically the identity of means and covariances
Extended generalised variances, with applications
We consider a measure Ïk of dispersion which extends the notion of Wilkâs generalised variance for a d-dimensional distribution, and is based on the mean squared volume of simplices of dimension kâ€d formed by k+1 independent copies. We show how Ïk can be expressed in terms of the eigenvalues of the covariance matrix of the distribution, also when a n-point sample is used for its estimation, and prove its concavity when raised at a suitable power. Some properties of dispersion-maximising distributions are derived, including a necessary and sufficient condition for optimality. Finally, we show how this measure of dispersion can be used for the design of optimal experiments, with equivalence to A and D-optimal design for k=1 and k=d, respectively. Simple illustrative examples are presented
The algebraic method in tree percolation
We apply the methods of algebraic reliability to the study of percolation on trees. To a complete -ary tree of depth we assign a monomial ideal on variables and minimal monomial generators. We give explicit recursive formulae for the Betti numbers of and their Hilbert series, which allow us to study explicitly percolation on . We study bounds on this percolation and study its asymptotical behavior with the mentioned commutative algebra techniques
- âŠ