212,387 research outputs found
Information Splitting for Big Data Analytics
Many statistical models require an estimation of unknown (co)-variance
parameter(s) in a model. The estimation usually obtained by maximizing a
log-likelihood which involves log determinant terms. In principle, one requires
the \emph{observed information}--the negative Hessian matrix or the second
derivative of the log-likelihood---to obtain an accurate maximum likelihood
estimator according to the Newton method. When one uses the \emph{Fisher
information}, the expect value of the observed information, a simpler algorithm
than the Newton method is obtained as the Fisher scoring algorithm. With the
advance in high-throughput technologies in the biological sciences,
recommendation systems and social networks, the sizes of data sets---and the
corresponding statistical models---have suddenly increased by several orders of
magnitude. Neither the observed information nor the Fisher information is easy
to obtained for these big data sets. This paper introduces an information
splitting technique to simplify the computation. After splitting the mean of
the observed information and the Fisher information, an simpler approximate
Hessian matrix for the log-likelihood can be obtained. This approximated
Hessian matrix can significantly reduce computations, and makes the linear
mixed model applicable for big data sets. Such a spitting and simpler formulas
heavily depends on matrix algebra transforms, and applicable to large scale
breeding model, genetics wide association analysis.Comment: arXiv admin note: text overlap with arXiv:1605.0764
Dropout Training as Adaptive Regularization
Dropout and other feature noising schemes control overfitting by artificially
corrupting the training data. For generalized linear models, dropout performs a
form of adaptive regularization. Using this viewpoint, we show that the dropout
regularizer is first-order equivalent to an L2 regularizer applied after
scaling the features by an estimate of the inverse diagonal Fisher information
matrix. We also establish a connection to AdaGrad, an online learning
algorithm, and find that a close relative of AdaGrad operates by repeatedly
solving linear dropout-regularized problems. By casting dropout as
regularization, we develop a natural semi-supervised algorithm that uses
unlabeled data to create a better adaptive regularizer. We apply this idea to
document classification tasks, and show that it consistently boosts the
performance of dropout training, improving on state-of-the-art results on the
IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS),
201
Optimal inputs for system identification
Identification criteria are presented for linear dynamic systems with and without process noise. With process noise, the state equations are replaced by the Kalman filter equations. If the identification performance index is expanded in a Taylor's series with respect to the parameters to be identified, then maximizing the weighting factor of the quadratic term with respect to the inputs will insure that an identification algorithm will converge more rapidly and to a more accurate result than with nonoptimal inputs. The expectation of this weighting factor is the Fisher information matrix, and its inverse is a lower bound for the covariance of the parameters. Direct and indirect methods of calculating the information matrix are presented for systems with and without process noise
On the solution of Stein's equation and Fisher information matrix of an ARMAX process
The main goal of this paper consists in expressing the solution of a Stein equation in terms of the Fisher information matrix (FIM) of a scalar ARMAX process. A condition for expressing the FIM in terms of a solution to a Stein equation is also set forth. Such interconnections can be derived when a companion matrix with eigenvalues equal to the roots of an appropriate polynomial associated with the ARMAX process is inserted in the Stein equation. The case of algebraic multiplicity greater than or equal to one is studied. The FIM and the corresponding solution to Stein’s equation are presented as solutions to systems of linear equations. The interconnections are obtained by using the common particular solution of these systems. The kernels of the structured coefficient matrices are described as well as some right inverses. This enables us to find a solution to the newly obtained linear system of equations
Delineating Parameter Unidentifiabilities in Complex Models
Scientists use mathematical modelling to understand and predict the
properties of complex physical systems. In highly parameterised models there
often exist relationships between parameters over which model predictions are
identical, or nearly so. These are known as structural or practical
unidentifiabilities, respectively. They are hard to diagnose and make reliable
parameter estimation from data impossible. They furthermore imply the existence
of an underlying model simplification. We describe a scalable method for
detecting unidentifiabilities, and the functional relations defining them, for
generic models. This allows for model simplification, and appreciation of which
parameters (or functions thereof) cannot be estimated from data. Our algorithm
can identify features such as redundant mechanisms and fast timescale
subsystems, as well as the regimes in which such approximations are valid. We
base our algorithm on a novel quantification of regional parametric
sensitivity: multiscale sloppiness. Traditionally, the link between parametric
sensitivity and the conditioning of the parameter estimation problem is made
locally, through the Fisher Information Matrix. This is valid in the regime of
infinitesimal measurement uncertainty. We demonstrate the duality between
multiscale sloppiness and the geometry of confidence regions surrounding
parameter estimates made where measurement uncertainty is non-negligible.
Further theoretical relationships are provided linking multiscale sloppiness to
the Likelihood-ratio test. From this, we show that a local sensitivity analysis
(as typically done) is insufficient for determining the reliability of
parameter estimation, even with simple (non)linear systems. Our algorithm
provides a tractable alternative. We finally apply our methods to a
large-scale, benchmark Systems Biology model of NF-B, uncovering
previously unknown unidentifiabilities
Quantum speed limits on operator flows and correlation functions
Quantum speed limits (QSLs) identify fundamental time scales of physical
processes by providing lower bounds on the rate of change of a quantum state or
the expectation value of an observable. We introduce a generalization of QSL
for unitary operator flows, which are ubiquitous in physics and relevant for
applications in both the quantum and classical domains. We derive two types of
QSLs and assess the existence of a crossover between them, that we illustrate
with a qubit and a random matrix Hamiltonian, as canonical examples. We further
apply our results to the time evolution of autocorrelation functions, obtaining
computable constraints on the linear dynamical response of quantum systems out
of equilibrium and the quantum Fisher information governing the precision in
quantum parameter estimation.Comment: 14 pages, 4 figure
Fisher information matrix for single molecules with stochastic trajectories
Tracking of objects in cellular environments has become a vital tool in
molecular cell biology. A particularly important example is single molecule
tracking which enables the study of the motion of a molecule in cellular
environments and provides quantitative information on the behavior of
individual molecules in cellular environments, which were not available before
through bulk studies. Here, we consider a dynamical system where the motion of
an object is modeled by stochastic differential equations (SDEs), and
measurements are the detected photons emitted by the moving fluorescently
labeled object, which occur at discrete time points, corresponding to the
arrival times of a Poisson process, in contrast to uniform time points which
have been commonly used in similar dynamical systems. The measurements are
distributed according to optical diffraction theory, and therefore, they would
be modeled by different distributions, e.g., a Born and Wolf profile for an
out-of-focus molecule. For some special circumstances, Gaussian image models
have been proposed. In this paper, we introduce a stochastic framework in which
we calculate the maximum likelihood estimates of the biophysical parameters of
the molecular interactions, e.g., diffusion and drift coefficients. More
importantly, we develop a general framework to calculate the Cram\'er-Rao lower
bound (CRLB), given by the inverse of the Fisher information matrix, for the
estimation of unknown parameters and use it as a benchmark in the evaluation of
the standard deviation of the estimates. There exists no established method,
even for Gaussian measurements, to systematically calculate the CRLB for the
general motion model that we consider in this paper. We apply the developed
methodology to simulated data of a molecule with linear trajectories and show
that the standard deviation of the estimates matches well with the square root
of the CRLB
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Natural gradient descent, which preconditions a gradient descent update with
the Fisher information matrix of the underlying statistical model, is a way to
capture partial second-order information. Several highly visible works have
advocated an approximation known as the empirical Fisher, drawing connections
between approximate second-order methods and heuristics like Adam. We dispute
this argument by showing that the empirical Fisher---unlike the Fisher---does
not generally capture second-order information. We further argue that the
conditions under which the empirical Fisher approaches the Fisher (and the
Hessian) are unlikely to be met in practice, and that, even on simple
optimization problems, the pathologies of the empirical Fisher can have
undesirable effects.Comment: V3: Minor corrections (typographic errors
- …