21,330 research outputs found

    Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

    Full text link
    In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

    Stochastic Weighted Graphs: Flexible Model Specification and Simulation

    Get PDF
    In most domains of network analysis researchers consider networks that arise in nature with weighted edges. Such networks are routinely dichotomized in the interest of using available methods for statistical inference with networks. The generalized exponential random graph model (GERGM) is a recently proposed method used to simulate and model the edges of a weighted graph. The GERGM specifies a joint distribution for an exponential family of graphs with continuous-valued edge weights. However, current estimation algorithms for the GERGM only allow inference on a restricted family of model specifications. To address this issue, we develop a Metropolis--Hastings method that can be used to estimate any GERGM specification, thereby significantly extending the family of weighted graphs that can be modeled with the GERGM. We show that new flexible model specifications are capable of avoiding likelihood degeneracy and efficiently capturing network structure in applications where such models were not previously available. We demonstrate the utility of this new class of GERGMs through application to two real network data sets, and we further assess the effectiveness of our proposed methodology by simulating non-degenerate model specifications from the well-studied two-stars model. A working R version of the GERGM code is available in the supplement and will be incorporated in the gergm CRAN package.Comment: 33 pages, 6 figures. To appear in Social Network

    A Bayesian Multivariate Functional Dynamic Linear Model

    Full text link
    We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data--functional, time dependent, and multivariate components--we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach for multivariate time-frequency analysis. Supplementary materials, including R code and the multi-economy yield curve data, are available online

    Near-stasis in the long-term diversification of Mesozoic tetrapods

    Get PDF
    How did evolution generate the extraordinary diversity of vertebrates on land? Zero species are known prior to ~380 million years ago, and more than 30,000 are present today. An expansionist model suggests this was achieved by large and unbounded increases, leading to substantially greater diversity in the present than at any time in the geological past. This model contrasts starkly with empirical support for constrained diversification in marine animals, suggesting different macroevolutionary processes on land and in the sea. We quantify patterns of vertebrate standing diversity on land during the Mesozoic–early Paleogene interval, applying sample-standardization to a global fossil dataset containing 27,260 occurrences of 4,898 non-marine tetrapod species. Our results show a highly stable pattern of Mesozoic tetrapod diversity at regional and local levels, underpinned by a weakly positive, but near-zero, long-term net diversification rate over 190 million years. Species diversity of non-flying terrestrial tetrapods less than doubled over this interval, despite the origins of exceptionally diverse extant groups within mammals, squamates, amphibians, and dinosaurs. Therefore, although speciose groups of modern tetrapods have Mesozoic origins, rates of Mesozoic diversification inferred from the fossil record are slow compared to those inferred from molecular phylogenies. If high speciation rates did occur in the Mesozoic, then they seem to have been balanced by extinctions among older clades. An apparent 4-fold expansion of species richness after the Cretaceous/Paleogene (K/Pg) boundary deserves further examination in light of potential taxonomic biases, but is consistent with the hypothesis that global environmental disturbances such as mass extinction events can rapidly adjust limits to diversity by restructuring ecosystems, and suggests that the gradualistic evolutionary diversification of tetrapods was punctuated by brief but dramatic episodes of radiation.27 page(s

    Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns

    Get PDF
    Survival analysis is commonly conducted in medical and public health research to assess the association of an exposure or intervention with a hard end outcome such as mortality. The Cox (proportional hazards) regression model is probably the most popular statistical tool used in this context. However, when the exposure includes compositional covariables (that is, variables representing a relative makeup such as a nutritional or physical activity behaviour composition), some basic assumptions of the Cox regression model and associated significance tests are violated. Compositional variables involve an intrinsic interplay between one another which precludes results and conclusions based on considering them in isolation as is ordinarily done. In this work, we introduce a formulation of the Cox regression model in terms of log-ratio coordinates which suitably deals with the constraints of compositional covariates, facilitates the use of common statistical inference methods, and allows for scientifically meaningful interpretations. We illustrate its practical application to a public health problem: the estimation of the mortality hazard associated with the composition of daily activity behaviour (physical activity, sitting time and sleep) using data from the U.S. National Health and Nutrition Examination Survey (NHANES)

    Joint asymptotics for semi-nonparametric regression models with partially linear structure

    Full text link
    We consider a joint asymptotic framework for studying semi-nonparametric regression models where (finite-dimensional) Euclidean parameters and (infinite-dimensional) functional parameters are both of interest. The class of models in consideration share a partially linear structure and are estimated in two general contexts: (i) quasi-likelihood and (ii) true likelihood. We first show that the Euclidean estimator and (pointwise) functional estimator, which are re-scaled at different rates, jointly converge to a zero-mean Gaussian vector. This weak convergence result reveals a surprising joint asymptotics phenomenon: these two estimators are asymptotically independent. A major goal of this paper is to gain first-hand insights into the above phenomenon. Moreover, a likelihood ratio testing is proposed for a set of joint local hypotheses, where a new version of the Wilks phenomenon [Ann. Math. Stat. 9 (1938) 60-62; Ann. Statist. 1 (2001) 153-193] is unveiled. A novel technical tool, called a joint Bahadur representation, is developed for studying these joint asymptotics results.Comment: Published at http://dx.doi.org/10.1214/15-AOS1313 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore