Search CORE

50,593 research outputs found

On association in regression: the coefficient of determination revisited

Author: Linde A. van der
Tutz Gerhard
Publication venue
Publication date: 01/01/2004
Field of study

Universal coefficients of determination are investigated which quantify the strength of the relation between a vector of dependent variables Y and a vector of independent covariates X. They are defined as measures of dependence between Y and X through theta(x), with theta(x) parameterizing the conditional distribution of Y given X=x. If theta(x) involves unknown coefficients gamma the definition is conditional on gamma, and in practice gamma, respectively the coefficient of determination has to be estimated. The estimates of quantities we propose generalize R^2 in classical linear regression and are also related to other definitions previously suggested. Our definitions apply to generalized regression models with arbitrary link functions as well as multivariate and nonparametric regression. The definition and use of the proposed coefficients of determination is illustrated for several regression problems with simulated and real data sets

Open Access LMU

High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Anandkumar Animashree
Tan Vincent Y. F.
Willsky Alan. S.
Publication venue
Publication date: 01/06/2011
Field of study

We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n=omega(J_{min}^{-2} log p), where p is the number of variables and J_{min} is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Breaking the self-averaging properties of spatial galaxy fluctuations in the Sloan Digital Sky Survey - Data Release Six

Author: Adelman-McCarthy
Aharony
Anderson
Baryshev
Benoist
Blanton
Blanton
Blanton
Broadhurst
Buchert
Busswell
Cabré
Clifton
Croton
Davis
Davis
Davis
Durrer
Durrer
Einasto
Einasto
Einasto
F. Sylos Labini
Freedman
Frith
Frith
Fukugita
Gabrielli
Gabrielli
Geller
Giovanelli
Hogg
Hubble
Humason
Joyce
Joyce
Joyce
Joyce
Joyce
Juric
Kaiser
Kauffmann
Kerscher
Kirshner
Landy
Lin
Loveday
Maddox
Massey
Montuori
N. L. Vasilyev
Park
Picard
Pietronero
Ratcliffe
Schecther
Shanks
Spergel
Springel
Strauss
Sylos Labini
Sylos Labini
Sylos Labini
Sylos Labini
Sylos Labini
Sylos Labini
Sylos Labini
Sylos Labini
Sylos Labini
Tegmark
Totsuji
Vasilyev
Weir
Wu
Y. V. Baryshev
Yasuda
Yoshii
Zehavi
Zehavi
Publication venue: 'EDP Sciences'
Publication date: 02/10/2009
Field of study

Statistical analyses of finite sample distributions usually assume that fluctuations are self-averaging, i.e. that they are statistically similar in different regions of the given sample volume. By using the scale-length method, we test whether this assumption is satisfied in several samples of the Sloan Digital Sky Survey Data Release Six. We find that the probability density function (PDF) of conditional fluctuations, filtered on large enough spatial scales (i.e., r>30 Mpc/h), shows relevant systematic variations in different sub-volumes of the survey. Instead for scales r<30 Mpc/h the PDF is statistically stable, and its first moment presents scaling behavior with a negative exponent around one. Thus while up to 30 Mpc/h galaxy structures have well-defined power-law correlations, on larger scales it is not possible to consider whole sample average quantities as meaningful and useful statistical descriptors. This situation is due to the fact that galaxy structures correspond to density fluctuations which are too large in amplitude and too extended in space to be self-averaging on such large scales inside the sample volumes: galaxy distribution is inhomogeneous up to the largest scales, i.e. r ~ 100 Mpc/h, probed by the SDSS samples. We show that cosmological corrections, as K-corrections and standard evolutionary corrections, do not qualitatively change the relevant behaviors. Finally we show that the large amplitude galaxy fluctuations observed in the SDSS samples are at odds with the predictions of the standard LCDM model of structure formation.(Abridged version).Comment: 32 pages, 28 figures, accepted for publication in Astronomy and Astrophysics. A higher resolution version is available at http://pil.phys.uniroma1.it/~sylos/fsl_highlights.html . Version v2 has been corrected to match the published on

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Nonlinear Time Series Modeling: A Unified Perspective, Algorithm, and Application

Author: Mukhopadhyay Subhadeep
Parzen Emanuel
Publication venue
Publication date: 23/12/2017
Field of study

A new comprehensive approach to nonlinear time series analysis and modeling is developed in the present paper. We introduce novel data-specific mid-distribution based Legendre Polynomial (LP) like nonlinear transformations of the original time series Y(t) that enables us to adapt all the existing stationary linear Gaussian time series modeling strategy and made it applicable for non-Gaussian and nonlinear processes in a robust fashion. The emphasis of the present paper is on empirical time series modeling via the algorithm LPTime. We demonstrate the effectiveness of our theoretical framework using daily S&P 500 return data between Jan/2/1963 - Dec/31/2009. Our proposed LPTime algorithm systematically discovers all the `stylized facts' of the financial time series automatically all at once, which were previously noted by many researchers one at a time.Comment: Major restructuring has been don

arXiv.org e-Print Archive

Directory of Open Access Journals

Stochastic Biasing and Galaxy-Mass Density Relation in the Weakly Non-linear Regime

Author: Chodorowski M. J.
Mo H. J.
Mo H. J.
Publication venue: 'University of Chicago Press'
Publication date: 01/01/1999
Field of study

It is believed that the biasing of the galaxies plays an important role for understanding the large-scale structure of the universe. In general, the biasing of galaxy formation could be stochastic. Furthermore, the future galaxy survey might allow us to explore the time evolution of the galaxy distribution. In this paper, the analytic study of the galaxy-mass density relation and its time evolution is presented within the framework of the stochastic biasing. In the weakly non-linear regime, we derive a general formula for the galaxy-mass density relation as a conditional mean using the Edgeworth expansion. The resulting expression contains the joint moments of the total mass and galaxy distributions. Using the perturbation theory, we investigate the time evolution of the joint moments and examine the influence of the initial stochasticity on the galaxy-mass density relation. The analysis shows that the galaxy-mass density relation could be well-approximated by the linear relation. Compared with the skewness of the galaxy distribution, we find that the estimation of the higher order moments using the conditional mean could be affected by the stochasticity. Therefore, the galaxy-mass density relation as a conditional mean should be used with a caution as a tool for estimating the skewness and the kurtosis.Comment: 22 pages, 7 Encapusulated Postscript Figures, aastex, The title and the structure of the paper has been changed, Results and conclusions unchanged, Accepted for publication in Ap

arXiv.org e-Print Archive

CiteSeerX

Crossref

Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles

Author: AP Dawid
C Genest
DH Wolpert
ED Sontag
F Pedregosa
GB Giannakis
I Zezula
J Kittler
J Kittler
L Breiman
L Xu
LK Hansen
M Wozniak
OP Faugeras
S Deerwester
TK Ho
V Tresp
Y Freund
Y Koren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/03/2019
Field of study

We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowing to aggregate the predictions of the classifiers trained by each learner. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness in case of dependent classifiers. A companion python implementation can be downloaded at https://github.com/john-klein/DELC

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

UCL Discovery

Hal-Diderot

Gaussian process hyper-parameter estimation using parallel asymptotically independent Markov sampling

Author: DiazDelaO F. A.
Garbuno-Inigo A.
Zuev K. M.
Publication venue
Publication date: 15/08/2016
Field of study

Gaussian process emulators of computationally expensive computer codes provide fast statistical approximations to model physical processes. The training of these surrogates depends on the set of design points chosen to run the simulator. Due to computational cost, such training set is bound to be limited and quantifying the resulting uncertainty in the hyper-parameters of the emulator by uni-modal distributions is likely to induce bias. In order to quantify this uncertainty, this paper proposes a computationally efficient sampler based on an extension of Asymptotically Independent Markov Sampling, a recently developed algorithm for Bayesian inference. Structural uncertainty of the emulator is obtained as a by-product of the Bayesian treatment of the hyper-parameters. Additionally, the user can choose to perform stochastic optimisation to sample from a neighbourhood of the Maximum a Posteriori estimate, even in the presence of multimodality. Model uncertainty is also acknowledged through numerical stabilisation measures by including a nugget term in the formulation of the probability model. The efficiency of the proposed sampler is illustrated in examples where multi-modal distributions are encountered. For the purpose of reproducibility, further development, and use in other applications the code used to generate the examples is freely available for download at https://github.com/agarbuno/paims_codesComment: Computational Statistics \& Data Analysis, Volume 103, November 201

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Caltech Authors