643 research outputs found
Time Dynamic Topic Models
Information extraction from large corpora can be a useful tool for many applications in industry and academia. For instance, political communication science has just recently begun to use the opportunities that come with the availability of massive amounts of information available through the Internet and the computational tools that natural language processing can provide. We give a linguistically motivated interpretation of topic modeling, a state-of-the-art algorithm for extracting latent semantic sets of words from large text corpora, and extend this interpretation to cover issues and issue-cycles as theoretical constructs coming from political communication science. We build on a dynamic topic model, a model whose semantic sets of words are allowed to evolve over time governed by a Brownian motion stochastic process and apply a new form of analysis to its result. Generally this analysis is based on the notion of volatility as in the rate of change of stocks or derivatives known from econometrics. We claim that the rate of change of sets of semantically related words can be interpreted as issue-cycles, the word sets as describing the underlying issue. Generalizing over the existing work, we introduce dynamic topic models that are driven by general (Brownian motion is a special case of our model) Gaussian processes, a family of stochastic processes defined by the function that determines their covariance structure. We use the above assumption and apply a certain class of covariance functions to allow for an appropriate rate of change in word sets while preserving the semantic relatedness among words. Applying our findings to a large newspaper data set, the New York Times Annotated corpus (all articles between 1987 and 2007), we are able to identify sub-topics in time, \\\\textit{time-localized topics} and find patterns in their behavior over time. However, we have to drop the assumption of semantic relatedness over all available time for any one topic. Time-localized topics are consistent in themselves but do not necessarily share semantic meaning between each other. They can, however, be interpreted to capture the notion of issues and their behavior that of issue-cycles
Recommended from our members
Some contributions to filtering theory with applications in financial modelling
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Two main groups of filtering algorithms are characterised and developed. Their applicability is demonstrated using actuarial and financial time series data. The first group of algorithms involved hidden Markov models (HMM), where the parameters of an asset price model switch between regimes in accordance with the dynamics of a Markov chain. We start with the known HMM filtering set-up and extend the
framework to the case where the drift and volatility have independent probabilistic
behaviour. In addition, a non-normal noise term is considered and recursive formulae
in the online re-estimation of model parameters are derived for the case of
students’ t-distributed noise. Change of reference probability is employed in the
construction of the filters. Both extensions are then tested on financial and actuarial
data. The second group of filtering algorithms deals with sigma point filtering
techniques. We propose a method to generate sigma points from symmetric multivariate
distributions. The algorithm matches the first three moments exactly and the fourth moment approximately; this minimises the worst case mismatch using a semidefinite programming approach. The sigma point generation procedure is in turn applied to construct algorithms in the latent state estimation of nonlinear time series models; a numerical demonstration of the procedure’s effectiveness is given. Finally, we propose a partially linearised sigma point filter, which is an alternative technique for the optimal state estimation of a wide class of nonlinear time series models. In particular, sigma points are employed for generating samples of possible state values and then a linear programming-based procedure is utilised in the update step of the state simulation. The performance of the filtering technique is then assessed on simulated, highly non-linear multivariate interest rate process and is shown to perform significantly better than the extended Kalman filter in terms of computational time
Nonlinear stochastic modeling with Langevin regression
Many physical systems characterized by nonlinear multiscale interactions can
be effectively modeled by treating unresolved degrees of freedom as random
fluctuations. However, even when the microscopic governing equations and
qualitative macroscopic behavior are known, it is often difficult to derive a
stochastic model that is consistent with observations. This is especially true
for systems such as turbulence where the perturbations do not behave like
Gaussian white noise, introducing non-Markovian behavior to the dynamics. We
address these challenges with a framework for identifying interpretable
stochastic nonlinear dynamics from experimental data, using both forward and
adjoint Fokker-Planck equations to enforce statistical consistency. If the form
of the Langevin equation is unknown, a simple sparsifying procedure can provide
an appropriate functional form. We demonstrate that this method can effectively
learn stochastic models in two artificial examples: recovering a nonlinear
Langevin equation forced by colored noise and approximating the second-order
dynamics of a particle in a double-well potential with the corresponding
first-order bifurcation normal form. Finally, we apply the proposed method to
experimental measurements of a turbulent bluff body wake and show that the
statistical behavior of the center of pressure can be described by the dynamics
of the corresponding laminar flow driven by nonlinear state-dependent noise.Comment: 30 pages, 13 figure
Unraveling the Thousand Word Picture: An Introduction to Super-Resolution Data Analysis
Super-resolution microscopy provides direct insight into fundamental biological processes occurring at length scales smaller than light’s diffraction limit. The analysis of data at such scales has brought statistical and machine learning methods into the mainstream. Here we provide a survey of data analysis methods starting from an overview of basic statistical techniques underlying the analysis of super-resolution and, more broadly, imaging data. We subsequently break down the analysis of super-resolution data into four problems: the localization problem, the counting problem, the linking problem, and what we’ve termed the interpretation problem
Inferring Latent States and Refining Force Estimates via Hierarchical Dirichlet Process Modeling in Single Particle Tracking Experiments
Optical microscopy provides rich spatio-temporal information characterizing
in vivo molecular motion. However, effective forces and other parameters used
to summarize molecular motion change over time in live cells due to latent
state changes, e.g., changes induced by dynamic micro-environments,
photobleaching, and other heterogeneity inherent in biological processes. This
study focuses on techniques for analyzing Single Particle Tracking (SPT) data
experiencing abrupt state changes. We demonstrate the approach on GFP tagged
chromatids experiencing metaphase in yeast cells and probe the effective forces
resulting from dynamic interactions that reflect the sum of a number of
physical phenomena. State changes are induced by factors such as microtubule
dynamics exerting force through the centromere, thermal polymer fluctuations,
etc. Simulations are used to demonstrate the relevance of the approach in more
general SPT data analyses. Refined force estimates are obtained by adopting and
modifying a nonparametric Bayesian modeling technique, the Hierarchical
Dirichlet Process Switching Linear Dynamical System (HDP-SLDS), for SPT
applications. The HDP-SLDS method shows promise in systematically identifying
dynamical regime changes induced by unobserved state changes when the number of
underlying states is unknown in advance (a common problem in SPT applications).
We expand on the relevance of the HDP-SLDS approach, review the relevant
background of Hierarchical Dirichlet Processes, show how to map discrete time
HDP-SLDS models to classic SPT models, and discuss limitations of the approach.
In addition, we demonstrate new computational techniques for tuning
hyperparameters and for checking the statistical consistency of model
assumptions directly against individual experimental trajectories; the
techniques circumvent the need for "ground-truth" and subjective information.Comment: 25 pages, 6 figures. Differs only typographically from PLoS One
publication available freely as an open-access article at
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.013763
EM algorithm for Markov chains observed via Gaussian noise and point process information: Theory and case studies
In this paper we study parameter estimation via the Expectation Maximization (EM) algorithm for a continuous-time hidden Markov model with diffusion and point process observation. Inference problems of this type arise for instance in credit risk modelling. A key step in the application of the EM algorithm is the derivation of finite-dimensional filters for the quantities that are needed in the E-Step of the algorithm. In this context we obtain exact, unnormalized and robust filters, and we discuss their numerical implementation. Moreover, we propose several goodness-of-fit tests for hidden Markov models with Gaussian noise and point process observation. We run an extensive simulation study to test speed and accuracy of our methodology. The paper closes with an application to credit risk: we estimate the parameters of a hidden Markov model for credit quality where the observations consist of rating transitions and credit spreads for US corporations
- …