1,267 research outputs found
A Metric Space for Point Process Excitations
A multivariate Hawkes process enables self- and cross-excitations through a
triggering matrix that behaves like an asymmetrical covariance structure,
characterizing pairwise interactions between the event types. Full-rank
estimation of all interactions is often infeasible in empirical settings.
Models that specialize on a spatiotemporal application alleviate this obstacle
by exploiting spatial locality, allowing the dyadic relationships between
events to depend only on separation in time and relative distances in real
Euclidean space. Here we generalize this framework to any multivariate Hawkes
process, and harness it as a vessel for embedding arbitrary event types in a
hidden metric space. Specifically, we propose a Hidden Hawkes Geometry (HHG)
model to uncover the hidden geometry between event excitations in a
multivariate point process. The low dimensionality of the embedding regularizes
the structure of the inferred interactions. We develop a number of estimators
and validate the model by conducting several experiments. In particular, we
investigate regional infectivity dynamics of COVID-19 in an early South Korean
record and recent Los Angeles confirmed cases. By additionally performing
synthetic experiments on short records as well as explorations into options
markets and the Ebola epidemic, we demonstrate that learning the embedding
alongside a point process uncovers salient interactions in a broad range of
applications
Learning Temporal Point Processes via Reinforcement Learning
Social goods, such as healthcare, smart city, and information networks, often
produce ordered event data in continuous time. The generative processes of
these event data can be very complex, requiring flexible models to capture
their dynamics. Temporal point processes offer an elegant framework for
modeling event data without discretizing the time. However, the existing
maximum-likelihood-estimation (MLE) learning paradigm requires hand-crafting
the intensity function beforehand and cannot directly monitor the
goodness-of-fit of the estimated model in the process of training. To alleviate
the risk of model-misspecification in MLE, we propose to generate samples from
the generative model and monitor the quality of the samples in the process of
training until the samples and the real data are indistinguishable. We take
inspiration from reinforcement learning (RL) and treat the generation of each
event as the action taken by a stochastic policy. We parameterize the policy as
a flexible recurrent neural network and gradually improve the policy to mimic
the observed event distribution. Since the reward function is unknown in this
setting, we uncover an analytic and nonparametric form of the reward function
using an inverse reinforcement learning formulation. This new RL framework
allows us to derive an efficient policy gradient algorithm for learning
flexible point process models, and we show that it performs well in both
synthetic and real data
Steering Social Activity: A Stochastic Optimal Control Point Of View
User engagement in online social networking depends critically on the level
of social activity in the corresponding platform--the number of online actions,
such as posts, shares or replies, taken by their users. Can we design
data-driven algorithms to increase social activity? At a user level, such
algorithms may increase activity by helping users decide when to take an action
to be more likely to be noticed by their peers. At a network level, they may
increase activity by incentivizing a few influential users to take more
actions, which in turn will trigger additional actions by other users. In this
paper, we model social activity using the framework of marked temporal point
processes, derive an alternate representation of these processes using
stochastic differential equations (SDEs) with jumps and, exploiting this
alternate representation, develop two efficient online algorithms with provable
guarantees to steer social activity both at a user and at a network level. In
doing so, we establish a previously unexplored connection between optimal
control of jump SDEs and doubly stochastic marked temporal point processes,
which is of independent interest. Finally, we experiment both with synthetic
and real data gathered from Twitter and show that our algorithms consistently
steer social activity more effectively than the state of the art.Comment: To appear in JMLR 2018. arXiv admin note: substantial text overlap
with arXiv:1610.05773, arXiv:1703.0205
Neural Jump Stochastic Differential Equations
Many time series are effectively generated by a combination of deterministic
continuous flows along with discrete jumps sparked by stochastic events.
However, we usually do not have the equation of motion describing the flows, or
how they are affected by jumps. To this end, we introduce Neural Jump
Stochastic Differential Equations that provide a data-driven approach to learn
continuous and discrete dynamic behavior, i.e., hybrid systems that both flow
and jump. Our approach extends the framework of Neural Ordinary Differential
Equations with a stochastic process term that models discrete events. We then
model temporal point processes with a piecewise-continuous latent trajectory,
where the discontinuities are caused by stochastic events whose conditional
intensity depends on the latent state. We demonstrate the predictive
capabilities of our model on a range of synthetic and real-world marked point
process datasets, including classical point processes (such as Hawkes
processes), awards on Stack Overflow, medical records, and earthquake
monitoring
A mathematical motivation for complex-valued convolutional networks
A complex-valued convolutional network (convnet) implements the repeated
application of the following composition of three operations, recursively
applying the composition to an input vector of nonnegative real numbers: (1)
convolution with complex-valued vectors followed by (2) taking the absolute
value of every entry of the resulting vectors followed by (3) local averaging.
For processing real-valued random vectors, complex-valued convnets can be
viewed as "data-driven multiscale windowed power spectra," "data-driven
multiscale windowed absolute spectra," "data-driven multiwavelet absolute
values," or (in their most general configuration) "data-driven nonlinear
multiwavelet packets." Indeed, complex-valued convnets can calculate multiscale
windowed spectra when the convnet filters are windowed complex-valued
exponentials. Standard real-valued convnets, using rectified linear units
(ReLUs), sigmoidal (for example, logistic or tanh) nonlinearities, max.
pooling, etc., do not obviously exhibit the same exact correspondence with
data-driven wavelets (whereas for complex-valued convnets, the correspondence
is much more than just a vague analogy). Courtesy of the exact correspondence,
the remarkably rich and rigorous body of mathematical analysis for wavelets
applies directly to (complex-valued) convnets.Comment: 11 pages, 3 figures; this is the retitled version submitted to the
journal, "Neural Computation
Correlated Random Measures
We develop correlated random measures, random measures where the atom weights
can exhibit a flexible pattern of dependence, and use them to develop powerful
hierarchical Bayesian nonparametric models. Hierarchical Bayesian nonparametric
models are usually built from completely random measures, a Poisson-process
based construction in which the atom weights are independent. Completely random
measures imply strong independence assumptions in the corresponding
hierarchical model, and these assumptions are often misplaced in real-world
settings. Correlated random measures address this limitation. They model
correlation within the measure by using a Gaussian process in concert with the
Poisson process. With correlated random measures, for example, we can develop a
latent feature model for which we can infer both the properties of the latent
features and their dependency pattern. We develop several other examples as
well. We study a correlated random measure model of pairwise count data. We
derive an efficient variational inference algorithm and show improved
predictive performance on large data sets of documents, web clicks, and
electronic health records
An Analytically Tractable Bayesian Approximation to Optimal Point Process Filtering
The process of dynamic state estimation (filtering) based on point process
observations is in general intractable. Numerical sampling techniques are often
practically useful, but lead to limited conceptual insight about optimal
encoding/decoding strategies, which are of significant relevance to
Computational Neuroscience. We develop an analytically tractable Bayesian
approximation to optimal filtering based on point process observations, which
allows us to introduce distributional assumptions about sensory cell
properties, that greatly facilitates the analysis of optimal encoding in
situations deviating from common assumptions of uniform coding. The analytic
framework leads to insights which are difficult to obtain from numerical
algorithms, and is consistent with experiments about the distribution of tuning
curve centers. Interestingly, we find that the information gained from the
absence of spikes may be crucial to performance
Overdispersed Black-Box Variational Inference
We introduce overdispersed black-box variational inference, a method to
reduce the variance of the Monte Carlo estimator of the gradient in black-box
variational inference. Instead of taking samples from the variational
distribution, we use importance sampling to take samples from an overdispersed
distribution in the same exponential family as the variational approximation.
Our approach is general since it can be readily applied to any exponential
family distribution, which is the typical choice for the variational
approximation. We run experiments on two non-conjugate probabilistic models to
show that our method effectively reduces the variance, and the overhead
introduced by the computation of the proposal parameters and the importance
weights is negligible. We find that our overdispersed importance sampling
scheme provides lower variance than black-box variational inference, even when
the latter uses twice the number of samples. This results in faster convergence
of the black-box inference procedure.Comment: 10 pages, 6 figure
Efficient Inference in Multi-task Cox Process Models
We generalize the log Gaussian Cox process (LGCP) framework to model multiple
correlated point data jointly. The observations are treated as realizations of
multiple LGCPs, whose log intensities are given by linear combinations of
latent functions drawn from Gaussian process priors. The combination
coefficients are also drawn from Gaussian processes and can incorporate
additional dependencies. We derive closed-form expressions for the moments of
the intensity functions and develop an efficient variational inference
algorithm that is orders of magnitude faster than competing deterministic and
stochastic approximations of multivariate LGCP, coregionalization models, and
multi-task permanental processes. Our approach outperforms these benchmarks in
multiple problems, offering the current state of the art in modeling
multivariate point processes
On MCMC for variationally sparse Gaussian processes: A pseudo-marginal approach
Gaussian processes (GPs) are frequently used in machine learning and
statistics to construct powerful models. However, when employing GPs in
practice, important considerations must be made, regarding the high
computational burden, approximation of the posterior, choice of the covariance
function and inference of its hyperparmeters. To address these issues, Hensman
et al. (2015) combine variationally sparse GPs with Markov chain Monte Carlo
(MCMC) to derive a scalable, flexible and general framework for GP models.
Nevertheless, the resulting approach requires intractable likelihood
evaluations for many observation models. To bypass this problem, we propose a
pseudo-marginal (PM) scheme that offers asymptotically exact inference as well
as computational gains through doubly stochastic estimators for the intractable
likelihood and large datasets. In complex models, the advantages of the PM
scheme are particularly evident, and we demonstrate this on a two-level GP
regression model with a nonparametric covariance function to capture
non-stationarity
- …