21,330 research outputs found
Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance
In this paper we present a linear regression model for modal symbolic data.
The observed variables are histogram variables according to the definition
given in the framework of Symbolic Data Analysis and the parameters of the
model are estimated using the classic Least Squares method. An appropriate
metric is introduced in order to measure the error between the observed and the
predicted distributions. In particular, the Wasserstein distance is proposed.
Some properties of such metric are exploited to predict the response variable
as direct linear combination of other independent histogram variables. Measures
of goodness of fit are discussed. An application on real data corroborates the
proposed method
Stochastic Weighted Graphs: Flexible Model Specification and Simulation
In most domains of network analysis researchers consider networks that arise
in nature with weighted edges. Such networks are routinely dichotomized in the
interest of using available methods for statistical inference with networks.
The generalized exponential random graph model (GERGM) is a recently proposed
method used to simulate and model the edges of a weighted graph. The GERGM
specifies a joint distribution for an exponential family of graphs with
continuous-valued edge weights. However, current estimation algorithms for the
GERGM only allow inference on a restricted family of model specifications. To
address this issue, we develop a Metropolis--Hastings method that can be used
to estimate any GERGM specification, thereby significantly extending the family
of weighted graphs that can be modeled with the GERGM. We show that new
flexible model specifications are capable of avoiding likelihood degeneracy and
efficiently capturing network structure in applications where such models were
not previously available. We demonstrate the utility of this new class of
GERGMs through application to two real network data sets, and we further assess
the effectiveness of our proposed methodology by simulating non-degenerate
model specifications from the well-studied two-stars model. A working R version
of the GERGM code is available in the supplement and will be incorporated in
the gergm CRAN package.Comment: 33 pages, 6 figures. To appear in Social Network
A Bayesian Multivariate Functional Dynamic Linear Model
We present a Bayesian approach for modeling multivariate, dependent
functional data. To account for the three dominant structural features in the
data--functional, time dependent, and multivariate components--we extend
hierarchical dynamic linear models for multivariate time series to the
functional data setting. We also develop Bayesian spline theory in a more
general constrained optimization framework. The proposed methods identify a
time-invariant functional basis for the functional observations, which is
smooth and interpretable, and can be made common across multivariate
observations for additional information sharing. The Bayesian framework permits
joint estimation of the model parameters, provides exact inference (up to MCMC
error) on specific parameters, and allows generalized dependence structures.
Sampling from the posterior distribution is accomplished with an efficient
Gibbs sampling algorithm. We illustrate the proposed framework with two
applications: (1) multi-economy yield curve data from the recent global
recession, and (2) local field potential brain signals in rats, for which we
develop a multivariate functional time series approach for multivariate
time-frequency analysis. Supplementary materials, including R code and the
multi-economy yield curve data, are available online
Near-stasis in the long-term diversification of Mesozoic tetrapods
How did evolution generate the extraordinary diversity of vertebrates on land? Zero species are known prior to ~380 million years ago, and more than 30,000 are present today. An expansionist model suggests this was achieved by large and unbounded increases, leading to substantially greater diversity in the present than at any time in the geological past. This model contrasts starkly with empirical support for constrained diversification in marine animals, suggesting different macroevolutionary processes on land and in the sea. We quantify patterns of vertebrate standing diversity on land during the Mesozoic–early Paleogene interval, applying sample-standardization to a global fossil dataset containing 27,260 occurrences of 4,898 non-marine tetrapod species. Our results show a highly stable pattern of Mesozoic tetrapod diversity at regional and local levels, underpinned by a weakly positive, but near-zero, long-term net diversification rate over 190 million years. Species diversity of non-flying terrestrial tetrapods less than doubled over this interval, despite the origins of exceptionally diverse extant groups within mammals, squamates, amphibians, and dinosaurs. Therefore, although speciose groups of modern tetrapods have Mesozoic origins, rates of Mesozoic diversification inferred from the fossil record are slow compared to those inferred from molecular phylogenies. If high speciation rates did occur in the Mesozoic, then they seem to have been balanced by extinctions among older clades. An apparent 4-fold expansion of species richness after the Cretaceous/Paleogene (K/Pg) boundary deserves further examination in light of potential taxonomic biases, but is consistent with the hypothesis that global environmental disturbances such as mass extinction events can rapidly adjust limits to diversity by restructuring ecosystems, and suggests that the gradualistic evolutionary diversification of tetrapods was punctuated by brief but dramatic episodes of radiation.27 page(s
Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns
Survival analysis is commonly conducted in medical and public health research to assess the association of an exposure or intervention with a hard end outcome such as mortality. The Cox (proportional hazards) regression model is probably the most popular statistical tool used in this context. However, when the exposure includes compositional covariables (that is, variables representing a relative makeup such as a nutritional or physical activity behaviour composition), some basic assumptions of the Cox regression model and associated significance tests are violated. Compositional variables involve an intrinsic interplay between one another which precludes results and conclusions based on considering them in isolation as is ordinarily done. In this work, we introduce a formulation of the Cox regression model in terms of log-ratio coordinates which suitably deals with the constraints of compositional covariates, facilitates the use of common statistical inference methods, and allows for scientifically meaningful interpretations. We illustrate its practical application to a public health problem: the estimation of the mortality hazard associated with the composition of daily activity behaviour (physical activity, sitting time and sleep) using data from the U.S. National Health and Nutrition Examination Survey (NHANES)
Joint asymptotics for semi-nonparametric regression models with partially linear structure
We consider a joint asymptotic framework for studying semi-nonparametric
regression models where (finite-dimensional) Euclidean parameters and
(infinite-dimensional) functional parameters are both of interest. The class of
models in consideration share a partially linear structure and are estimated in
two general contexts: (i) quasi-likelihood and (ii) true likelihood. We first
show that the Euclidean estimator and (pointwise) functional estimator, which
are re-scaled at different rates, jointly converge to a zero-mean Gaussian
vector. This weak convergence result reveals a surprising joint asymptotics
phenomenon: these two estimators are asymptotically independent. A major goal
of this paper is to gain first-hand insights into the above phenomenon.
Moreover, a likelihood ratio testing is proposed for a set of joint local
hypotheses, where a new version of the Wilks phenomenon [Ann. Math. Stat. 9
(1938) 60-62; Ann. Statist. 1 (2001) 153-193] is unveiled. A novel technical
tool, called a joint Bahadur representation, is developed for studying these
joint asymptotics results.Comment: Published at http://dx.doi.org/10.1214/15-AOS1313 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …