15,244 research outputs found

    Maximum Fidelity

    Full text link
    The most fundamental problem in statistics is the inference of an unknown probability distribution from a finite number of samples. For a specific observed data set, answers to the following questions would be desirable: (1) Estimation: Which candidate distribution provides the best fit to the observed data?, (2) Goodness-of-fit: How concordant is this distribution with the observed data?, and (3) Uncertainty: How concordant are other candidate distributions with the observed data? A simple unified approach for univariate data that addresses these traditionally distinct statistical notions is presented called "maximum fidelity". Maximum fidelity is a strict frequentist approach that is fundamentally based on model concordance with the observed data. The fidelity statistic is a general information measure based on the coordinate-independent cumulative distribution and critical yet previously neglected symmetry considerations. An approximation for the null distribution of the fidelity allows its direct conversion to absolute model concordance (p value). Fidelity maximization allows identification of the most concordant model distribution, generating a method for parameter estimation, with neighboring, less concordant distributions providing the "uncertainty" in this estimate. Maximum fidelity provides an optimal approach for parameter estimation (superior to maximum likelihood) and a generally optimal approach for goodness-of-fit assessment of arbitrary models applied to univariate data. Extensions to binary data, binned data, multidimensional data, and classical parametric and nonparametric statistical tests are described. Maximum fidelity provides a philosophically consistent, robust, and seemingly optimal foundation for statistical inference. All findings are presented in an elementary way to be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte

    On Graphical Models via Univariate Exponential Family Distributions

    Full text link
    Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications. Popular instances of this class include Gaussian graphical models and Ising models. In many settings, however, it might not be clear which subclass of graphical models to use, particularly for non-Gaussian and non-categorical data. In this paper, we consider a general sub-class of graphical models where the node-wise conditional distributions arise from exponential families. This allows us to derive multivariate graphical model distributions from univariate exponential family distributions, such as the Poisson, negative binomial, and exponential distributions. Our key contributions include a class of M-estimators to fit these graphical model distributions; and rigorous statistical analysis showing that these M-estimators recover the true graphical model structure exactly, with high probability. We provide examples of genomic and proteomic networks learned via instances of our class of graphical models derived from Poisson and exponential distributions.Comment: Journal of Machine Learning Researc

    Fitting Effective Diffusion Models to Data Associated with a "Glassy Potential": Estimation, Classical Inference Procedures and Some Heuristics

    Full text link
    A variety of researchers have successfully obtained the parameters of low dimensional diffusion models using the data that comes out of atomistic simulations. This naturally raises a variety of questions about efficient estimation, goodness-of-fit tests, and confidence interval estimation. The first part of this article uses maximum likelihood estimation to obtain the parameters of a diffusion model from a scalar time series. I address numerical issues associated with attempting to realize asymptotic statistics results with moderate sample sizes in the presence of exact and approximated transition densities. Approximate transition densities are used because the analytic solution of a transition density associated with a parametric diffusion model is often unknown.I am primarily interested in how well the deterministic transition density expansions of Ait-Sahalia capture the curvature of the transition density in (idealized) situations that occur when one carries out simulations in the presence of a "glassy" interaction potential. Accurate approximation of the curvature of the transition density is desirable because it can be used to quantify the goodness-of-fit of the model and to calculate asymptotic confidence intervals of the estimated parameters. The second part of this paper contributes a heuristic estimation technique for approximating a nonlinear diffusion model. A "global" nonlinear model is obtained by taking a batch of time series and applying simple local models to portions of the data. I demonstrate the technique on a diffusion model with a known transition density and on data generated by the Stochastic Simulation Algorithm.Comment: 30 pages 10 figures Submitted to SIAM MMS (typos removed and slightly shortened

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Tests based on characterizations, and their efficiencies: a survey

    Get PDF
    A survey of goodness-of-fit and symmetry tests based on the characterization properties of distributions is presented. This approach became popular in recent years. In most cases the test statistics are functionals of UU-empirical processes. The limiting distributions and large deviations of new statistics under the null hypothesis are described. Their local Bahadur efficiency for various parametric alternatives is calculated and compared with each other as well as with diverse previously known tests. We also describe new directions of possible research in this domain.Comment: Open access in Acta et Commentationes Universitatis Tartuensis de Mathematic
    corecore