15,244 research outputs found
Maximum Fidelity
The most fundamental problem in statistics is the inference of an unknown
probability distribution from a finite number of samples. For a specific
observed data set, answers to the following questions would be desirable: (1)
Estimation: Which candidate distribution provides the best fit to the observed
data?, (2) Goodness-of-fit: How concordant is this distribution with the
observed data?, and (3) Uncertainty: How concordant are other candidate
distributions with the observed data? A simple unified approach for univariate
data that addresses these traditionally distinct statistical notions is
presented called "maximum fidelity". Maximum fidelity is a strict frequentist
approach that is fundamentally based on model concordance with the observed
data. The fidelity statistic is a general information measure based on the
coordinate-independent cumulative distribution and critical yet previously
neglected symmetry considerations. An approximation for the null distribution
of the fidelity allows its direct conversion to absolute model concordance (p
value). Fidelity maximization allows identification of the most concordant
model distribution, generating a method for parameter estimation, with
neighboring, less concordant distributions providing the "uncertainty" in this
estimate. Maximum fidelity provides an optimal approach for parameter
estimation (superior to maximum likelihood) and a generally optimal approach
for goodness-of-fit assessment of arbitrary models applied to univariate data.
Extensions to binary data, binned data, multidimensional data, and classical
parametric and nonparametric statistical tests are described. Maximum fidelity
provides a philosophically consistent, robust, and seemingly optimal foundation
for statistical inference. All findings are presented in an elementary way to
be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte
On Graphical Models via Univariate Exponential Family Distributions
Undirected graphical models, or Markov networks, are a popular class of
statistical models, used in a wide variety of applications. Popular instances
of this class include Gaussian graphical models and Ising models. In many
settings, however, it might not be clear which subclass of graphical models to
use, particularly for non-Gaussian and non-categorical data. In this paper, we
consider a general sub-class of graphical models where the node-wise
conditional distributions arise from exponential families. This allows us to
derive multivariate graphical model distributions from univariate exponential
family distributions, such as the Poisson, negative binomial, and exponential
distributions. Our key contributions include a class of M-estimators to fit
these graphical model distributions; and rigorous statistical analysis showing
that these M-estimators recover the true graphical model structure exactly,
with high probability. We provide examples of genomic and proteomic networks
learned via instances of our class of graphical models derived from Poisson and
exponential distributions.Comment: Journal of Machine Learning Researc
Fitting Effective Diffusion Models to Data Associated with a "Glassy Potential": Estimation, Classical Inference Procedures and Some Heuristics
A variety of researchers have successfully obtained the parameters of low
dimensional diffusion models using the data that comes out of atomistic
simulations. This naturally raises a variety of questions about efficient
estimation, goodness-of-fit tests, and confidence interval estimation. The
first part of this article uses maximum likelihood estimation to obtain the
parameters of a diffusion model from a scalar time series. I address numerical
issues associated with attempting to realize asymptotic statistics results with
moderate sample sizes in the presence of exact and approximated transition
densities. Approximate transition densities are used because the analytic
solution of a transition density associated with a parametric diffusion model
is often unknown.I am primarily interested in how well the deterministic
transition density expansions of Ait-Sahalia capture the curvature of the
transition density in (idealized) situations that occur when one carries out
simulations in the presence of a "glassy" interaction potential. Accurate
approximation of the curvature of the transition density is desirable because
it can be used to quantify the goodness-of-fit of the model and to calculate
asymptotic confidence intervals of the estimated parameters. The second part of
this paper contributes a heuristic estimation technique for approximating a
nonlinear diffusion model. A "global" nonlinear model is obtained by taking a
batch of time series and applying simple local models to portions of the data.
I demonstrate the technique on a diffusion model with a known transition
density and on data generated by the Stochastic Simulation Algorithm.Comment: 30 pages 10 figures Submitted to SIAM MMS (typos removed and slightly
shortened
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Tests based on characterizations, and their efficiencies: a survey
A survey of goodness-of-fit and symmetry tests based on the characterization
properties of distributions is presented. This approach became popular in
recent years. In most cases the test statistics are functionals of
-empirical processes. The limiting distributions and large deviations of new
statistics under the null hypothesis are described. Their local Bahadur
efficiency for various parametric alternatives is calculated and compared with
each other as well as with diverse previously known tests. We also describe new
directions of possible research in this domain.Comment: Open access in Acta et Commentationes Universitatis Tartuensis de
Mathematic
- …