12 research outputs found
Time-Dependent Representation for Neural Event Sequence Prediction
Existing sequence prediction methods are mostly concerned with
time-independent sequences, in which the actual time span between events is
irrelevant and the distance between events is simply the difference between
their order positions in the sequence. While this time-independent view of
sequences is applicable for data such as natural languages, e.g., dealing with
words in a sentence, it is inappropriate and inefficient for many real world
events that are observed and collected at unequally spaced points of time as
they naturally arise, e.g., when a person goes to a grocery store or makes a
phone call. The time span between events can carry important information about
the sequence dependence of human behaviors. In this work, we propose a set of
methods for using time in sequence prediction. Because neural sequence models
such as RNN are more amenable for handling token-like input, we propose two
methods for time-dependent event representation, based on the intuition on how
time is tokenized in everyday life and previous work on embedding
contextualization. We also introduce two methods for using next event duration
as regularization for training a sequence prediction model. We discuss these
methods based on recurrent neural nets. We evaluate these methods as well as
baseline models on five datasets that resemble a variety of sequence prediction
tasks. The experiments revealed that the proposed methods offer accuracy gain
over baseline models in a range of settings.Comment: 9 pages and 2 pages of reference
ODEVAE: Deep generative second order ODEs with Bayesian neural networks
We present Ordinary Differential Equation Variational Auto-Encoder
(ODEVAE), a latent second order ODE model for high-dimensional sequential
data. Leveraging the advances in deep generative models, ODEVAE can
simultaneously learn the embedding of high dimensional trajectories and infer
arbitrarily complex continuous-time latent dynamics. Our model explicitly
decomposes the latent space into momentum and position components and solves a
second order ODE system, which is in contrast to recurrent neural network (RNN)
based time series models and recently proposed black-box ODE techniques. In
order to account for uncertainty, we propose probabilistic latent ODE dynamics
parameterized by deep Bayesian neural networks. We demonstrate our approach on
motion capture, image rotation and bouncing balls datasets. We achieve
state-of-the-art performance in long term motion prediction and imputation
tasks
Time Masking: Leveraging Temporal Information in Spoken Dialogue Systems
In a spoken dialogue system, dialogue state tracker (DST) components track
the state of the conversation by updating a distribution of values associated
with each of the slots being tracked for the current user turn, using the
interactions until then. Much of the previous work has relied on modeling the
natural order of the conversation, using distance based offsets as an
approximation of time. In this work, we hypothesize that leveraging the
wall-clock temporal difference between turns is crucial for finer-grained
control of dialogue scenarios. We develop a novel approach that applies a {\it
time mask}, based on the wall-clock time difference, to the associated slot
embeddings and empirically demonstrate that our proposed approach outperforms
existing approaches that leverage distance offsets, on both an internal
benchmark dataset as well as DSTC2.Comment: SIGDIAL 201
Hybrid Model with Time Modeling for Sequential Recommender Systems
Deep learning based methods have been used successfully in recommender system
problems. Approaches using recurrent neural networks, transformers, and
attention mechanisms are useful to model users' long- and short-term
preferences in sequential interactions. To explore different session-based
recommendation solutions, Booking.com recently organized the WSDM WebTour 2021
Challenge, which aims to benchmark models to recommend the final city in a
trip. This study presents our approach to this challenge. We conducted several
experiments to test different state-of-the-art deep learning architectures for
recommender systems. Further, we proposed some changes to Neural Attentive
Recommendation Machine (NARM), adapted its architecture for the challenge
objective, and implemented training approaches that can be used in any
session-based model to improve accuracy. Our experimental result shows that the
improved NARM outperforms all other state-of-the-art benchmark methods.Comment: 5 pages, 2 figures, WSDM Workshop on Web Tourism 202
Self-attention with Functional Time Representation Learning
Sequential modelling with self-attention has achieved cutting edge
performances in natural language processing. With advantages in model
flexibility, computation complexity and interpretability, self-attention is
gradually becoming a key component in event sequence models. However, like most
other sequence models, self-attention does not account for the time span
between events and thus captures sequential signals rather than temporal
patterns. Without relying on recurrent network structures, self-attention
recognizes event orderings via positional encoding. To bridge the gap between
modelling time-independent and time-dependent event sequence, we introduce a
functional feature map that embeds time span into high-dimensional spaces. By
constructing the associated translation-invariant time kernel function, we
reveal the functional forms of the feature map under classic functional
function analysis results, namely Bochner's Theorem and Mercer's Theorem. We
propose several models to learn the functional time representation and the
interactions with event representation. These methods are evaluated on
real-world datasets under various continuous-time event sequence prediction
tasks. The experiments reveal that the proposed methods compare favorably to
baseline models while also capturing useful time-event interactions
Neural Ordinary Differential Equations
We introduce a new family of deep neural network models. Instead of
specifying a discrete sequence of hidden layers, we parameterize the derivative
of the hidden state using a neural network. The output of the network is
computed using a black-box differential equation solver. These continuous-depth
models have constant memory cost, adapt their evaluation strategy to each
input, and can explicitly trade numerical precision for speed. We demonstrate
these properties in continuous-depth residual networks and continuous-time
latent variable models. We also construct continuous normalizing flows, a
generative model that can train by maximum likelihood, without partitioning or
ordering the data dimensions. For training, we show how to scalably
backpropagate through any ODE solver, without access to its internal
operations. This allows end-to-end training of ODEs within larger models
Uncertainty on Asynchronous Time Event Prediction
Asynchronous event sequences are the basis of many applications throughout
different industries. In this work, we tackle the task of predicting the next
event (given a history), and how this prediction changes with the passage of
time. Since at some time points (e.g. predictions far into the future) we might
not be able to predict anything with confidence, capturing uncertainty in the
predictions is crucial. We present two new architectures, WGP-LN and FD-Dir,
modelling the evolution of the distribution on the probability simplex with
time-dependent logistic normal and Dirichlet distributions. In both cases, the
combination of RNNs with either Gaussian process or function decomposition
allows to express rich temporal evolution of the distribution parameters, and
naturally captures uncertainty. Experiments on class prediction, time
prediction and anomaly detection demonstrate the high performances of our
models on various datasets compared to other approaches.Comment: Neurips 2019 (Spotlight
Spatio-Temporal Convolutional LSTMs for Tumor Growth Prediction by Learning 4D Longitudinal Patient Data
Prognostic tumor growth modeling via volumetric medical imaging observations
can potentially lead to better outcomes of tumor treatment and surgical
planning. Recent advances of convolutional networks have demonstrated higher
accuracy than traditional mathematical models in predicting future tumor
volumes. This indicates that deep learning-based techniques may have great
potentials on addressing such problem. However, current 2D patch-based modeling
approaches cannot make full use of the spatio-temporal imaging context of the
tumor's longitudinal 4D (3D + time) data. Moreover, they are incapable to
predict clinically-relevant tumor properties, other than volumes. In this
paper, we exploit to formulate the tumor growth process through convolutional
Long Short-Term Memory (ConvLSTM) that extract tumor's static imaging
appearances and capture its temporal dynamic changes within a single network.
We extend ConvLSTM into the spatio-temporal domain (ST-ConvLSTM) by jointly
learning the inter-slice 3D contexts and the longitudinal or temporal dynamics
from multiple patient studies. Our approach can incorporate other non-imaging
patient information in an end-to-end trainable manner. Experiments are
conducted on the largest 4D longitudinal tumor dataset of 33 patients to date.
Results validate that the ST-ConvLSTM produces a Dice score of 83.2%+-5.1% and
a RVD of 11.2%+-10.8%, both significantly outperforming (p<0.05) other compared
methods of linear model, ConvLSTM, and generative adversarial network (GAN)
under the metric of predicting future tumor volumes. Additionally, our new
method enables the prediction of both cell density and CT intensity numbers.
Last, we demonstrate the generalizability of ST-ConvLSTM by employing it in 4D
medical image segmentation task, which achieves an averaged Dice score of
86.3+-1.2% for left-ventricle segmentation in 4D ultrasound with 3 seconds per
patient
Representation Learning for Dynamic Graphs: A Survey
Graphs arise naturally in many real-world applications including social
networks, recommender systems, ontologies, biology, and computational finance.
Traditionally, machine learning models for graphs have been mostly designed for
static graphs. However, many applications involve evolving graphs. This
introduces important challenges for learning and inference since nodes,
attributes, and edges change over time. In this survey, we review the recent
advances in representation learning for dynamic graphs, including dynamic
knowledge graphs. We describe existing models from an encoder-decoder
perspective, categorize these encoders and decoders based on the techniques
they employ, and analyze the approaches in each category. We also review
several prominent applications and widely used datasets and highlight
directions for future research.Comment: Accepted at JMLR, 73 pages, 2 figure
D\'ej\`a vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation
Predicting users' preferences based on their sequential behaviors in history
is challenging and crucial for modern recommender systems. Most existing
sequential recommendation algorithms focus on transitional structure among the
sequential actions, but largely ignore the temporal and context information,
when modeling the influence of a historical event to current prediction.
In this paper, we argue that the influence from the past events on a user's
current action should vary over the course of time and under different context.
Thus, we propose a Contextualized Temporal Attention Mechanism that learns to
weigh historical actions' influence on not only what action it is, but also
when and how the action took place. More specifically, to dynamically calibrate
the relative input dependence from the self-attention mechanism, we deploy
multiple parameterized kernel functions to learn various temporal dynamics, and
then use the context information to determine which of these reweighing kernels
to follow for each input. In empirical evaluations on two large public
recommendation datasets, our model consistently outperformed an extensive set
of state-of-the-art sequential recommendation methods.Comment: Key Words: Sequential Recommendation, Self-attention mechanism,
Temporal Recommendatio