Search CORE

21,330 research outputs found

Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

Author: A Irpino
Antonio Irpino
B Efron
CL Lawson
CL Mallows
E Diday
EAL Neto
EAL Neto
G Dall’Aglio
H Bock
J Arroyo
L Billard
L Kantorovich
L Wasserstein
M Noirhomme-Fraiture
P Bertrand
P Bickel
R Tibshirani
Rosanna Verde
WG Gilchrist
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2012
Field of study

In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Stochastic Weighted Graphs: Flexible Model Specification and Simulation

Author: Bhamidi Shankar
Cranmer Skyler
Denny Matthew J.
Desmarais Bruce
Wilson James D.
Publication venue
Publication date: 09/11/2016
Field of study

In most domains of network analysis researchers consider networks that arise in nature with weighted edges. Such networks are routinely dichotomized in the interest of using available methods for statistical inference with networks. The generalized exponential random graph model (GERGM) is a recently proposed method used to simulate and model the edges of a weighted graph. The GERGM specifies a joint distribution for an exponential family of graphs with continuous-valued edge weights. However, current estimation algorithms for the GERGM only allow inference on a restricted family of model specifications. To address this issue, we develop a Metropolis--Hastings method that can be used to estimate any GERGM specification, thereby significantly extending the family of weighted graphs that can be modeled with the GERGM. We show that new flexible model specifications are capable of avoiding likelihood degeneracy and efficiently capturing network structure in applications where such models were not previously available. We demonstrate the utility of this new class of GERGMs through application to two real network data sets, and we further assess the effectiveness of our proposed methodology by simulating non-degenerate model specifications from the well-studied two-stars model. A working R version of the GERGM code is available in the supplement and will be incorporated in the gergm CRAN package.Comment: 33 pages, 6 figures. To appear in Social Network

arXiv.org e-Print Archive

Carolina Digital Repository

A Bayesian Multivariate Functional Dynamic Linear Model

Author: Kowal Daniel R.
Matteson David S.
Ruppert David
Publication venue: 'Informa UK Limited'
Publication date: 05/08/2015
Field of study

We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data--functional, time dependent, and multivariate components--we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach for multivariate time-frequency analysis. Supplementary materials, including R code and the multi-economy yield curve data, are available online

arXiv.org e-Print Archive

FigShare

Near-stasis in the long-term diversification of Mesozoic tetrapods

Author: Alroy J
Benson RBJ
Butler RJ
Carrano MT
Lloyd GT
Mannion PD
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/12/2015
Field of study

How did evolution generate the extraordinary diversity of vertebrates on land? Zero species are known prior to ~380 million years ago, and more than 30,000 are present today. An expansionist model suggests this was achieved by large and unbounded increases, leading to substantially greater diversity in the present than at any time in the geological past. This model contrasts starkly with empirical support for constrained diversification in marine animals, suggesting different macroevolutionary processes on land and in the sea. We quantify patterns of vertebrate standing diversity on land during the Mesozoic–early Paleogene interval, applying sample-standardization to a global fossil dataset containing 27,260 occurrences of 4,898 non-marine tetrapod species. Our results show a highly stable pattern of Mesozoic tetrapod diversity at regional and local levels, underpinned by a weakly positive, but near-zero, long-term net diversification rate over 190 million years. Species diversity of non-flying terrestrial tetrapods less than doubled over this interval, despite the origins of exceptionally diverse extant groups within mammals, squamates, amphibians, and dinosaurs. Therefore, although speciose groups of modern tetrapods have Mesozoic origins, rates of Mesozoic diversification inferred from the fossil record are slow compared to those inferred from molecular phylogenies. If high speciation rates did occur in the Mesozoic, then they seem to have been balanced by extinctions among older clades. An apparent 4-fold expansion of species richness after the Cretaceous/Paleogene (K/Pg) boundary deserves further examination in light of potential taxonomic biases, but is consistent with the hypothesis that global environmental disturbances such as mass extinction events can rapidly adjust limits to diversity by restructuring ecosystems, and suggests that the gradualistic evolutionary diversification of tetrapods was punctuated by brief but dramatic episodes of radiation.27 page(s

University of Birmingham Research Portal

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

Macquarie University ResearchOnline

FigShare

Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns

Author: Chastin S.F.M.
Dall P.M.
Hron K.
McGregor D.E.
Palarea-Albaladejo J.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2020
Field of study

Survival analysis is commonly conducted in medical and public health research to assess the association of an exposure or intervention with a hard end outcome such as mortality. The Cox (proportional hazards) regression model is probably the most popular statistical tool used in this context. However, when the exposure includes compositional covariables (that is, variables representing a relative makeup such as a nutritional or physical activity behaviour composition), some basic assumptions of the Cox regression model and associated significance tests are violated. Compositional variables involve an intrinsic interplay between one another which precludes results and conclusions based on considering them in isolation as is ordinarily done. In this work, we introduce a formulation of the Cox regression model in terms of log-ratio coordinates which suitably deals with the constraints of compositional covariates, facilitates the use of common statistical inference methods, and allows for scientifically meaningful interpretations. We illustrate its practical application to a public health problem: the estimation of the mortality hazard associated with the composition of daily activity behaviour (physical activity, sitting time and sleep) using data from the U.S. National Health and Nutrition Examination Survey (NHANES)

Ghent University Academic Bibliography

ResearchOnline@GCU

Joint asymptotics for semi-nonparametric regression models with partially linear structure

Author: Cheng Guang
Shang Zuofeng
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 03/06/2015
Field of study

We consider a joint asymptotic framework for studying semi-nonparametric regression models where (finite-dimensional) Euclidean parameters and (infinite-dimensional) functional parameters are both of interest. The class of models in consideration share a partially linear structure and are estimated in two general contexts: (i) quasi-likelihood and (ii) true likelihood. We first show that the Euclidean estimator and (pointwise) functional estimator, which are re-scaled at different rates, jointly converge to a zero-mean Gaussian vector. This weak convergence result reveals a surprising joint asymptotics phenomenon: these two estimators are asymptotically independent. A major goal of this paper is to gain first-hand insights into the above phenomenon. Moreover, a likelihood ratio testing is proposed for a set of joint local hypotheses, where a new version of the Wilks phenomenon [Ann. Math. Stat. 9 (1938) 60-62; Ann. Statist. 1 (2001) 153-193] is unveiled. A novel technical tool, called a joint Bahadur representation, is developed for studying these joint asymptotics results.Comment: Published at http://dx.doi.org/10.1214/15-AOS1313 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX