7,165 research outputs found
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data
Similarity-based approaches represent a promising direction for time series
analysis. However, many such methods rely on parameter tuning, and some have
shortcomings if the time series are multivariate (MTS), due to dependencies
between attributes, or the time series contain missing data. In this paper, we
address these challenges within the powerful context of kernel methods by
proposing the robust \emph{time series cluster kernel} (TCK). The approach
taken leverages the missing data handling properties of Gaussian mixture models
(GMM) augmented with informative prior distributions. An ensemble learning
approach is exploited to ensure robustness to parameters by combining the
clustering results of many GMM to form the final kernel.
We evaluate the TCK on synthetic and real data and compare to other
state-of-the-art techniques. The experimental results demonstrate that the TCK
is robust to parameter choices, provides competitive results for MTS without
missing data and outstanding results for missing data.Comment: 23 pages, 6 figure
Modeling Interdependent and Periodic Real-World Action Sequences
Mobile health applications, including those that track activities such as
exercise, sleep, and diet, are becoming widely used. Accurately predicting
human actions is essential for targeted recommendations that could improve our
health and for personalization of these applications. However, making such
predictions is extremely difficult due to the complexities of human behavior,
which consists of a large number of potential actions that vary over time,
depend on each other, and are periodic. Previous work has not jointly modeled
these dynamics and has largely focused on item consumption patterns instead of
broader types of behaviors such as eating, commuting or exercising. In this
work, we develop a novel statistical model for Time-varying, Interdependent,
and Periodic Action Sequences. Our approach is based on personalized,
multivariate temporal point processes that model time-varying action
propensities through a mixture of Gaussian intensities. Our model captures
short-term and long-term periodic interdependencies between actions through
Hawkes process-based self-excitations. We evaluate our approach on two activity
logging datasets comprising 12 million actions taken by 20 thousand users over
17 months. We demonstrate that our approach allows us to make successful
predictions of future user actions and their timing. Specifically, our model
improves predictions of actions, and their timing, over existing methods across
multiple datasets by up to 156%, and up to 37%, respectively. Performance
improvements are particularly large for relatively rare and periodic actions
such as walking and biking, improving over baselines by up to 256%. This
demonstrates that explicit modeling of dependencies and periodicities in
real-world behavior enables successful predictions of future actions, with
implications for modeling human behavior, app personalization, and targeting of
health interventions.Comment: Accepted at WWW 201
Markov Models for Network-Behavior Modeling and Anonymization
Modern network security research has demonstrated a clear need for open sharing of traffic datasets between organizations, a need that has so far been superseded by the challenge of removing sensitive content beforehand. Network Data Anonymization (NDA) is emerging as a field dedicated to this problem, with its main direction focusing on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks -- particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Recent anonymization results have shown that the extent to which utility and privacy can be obtained is mainly a function of the information in the data that one is aware and not aware of. This paper leverages the predictability of network behavior in our favor to augment existing frameworks through a new machine-learning-driven anonymization technique. Our approach uses the substitution of individual identities with group identities where members are divided based on behavioral similarities, essentially providing anonymity-by-crowds in a statistical mix-net. We derive time-series models for network traffic behavior which quantifiably models the discriminative features of network "behavior" and introduce a kernel-based framework for anonymity which fits together naturally with network-data modeling
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Joint segmentation of multivariate time series with hidden process regression for human activity recognition
The problem of human activity recognition is central for understanding and
predicting the human behavior, in particular in a prospective of assistive
services to humans, such as health monitoring, well being, security, etc. There
is therefore a growing need to build accurate models which can take into
account the variability of the human activities over time (dynamic models)
rather than static ones which can have some limitations in such a dynamic
context. In this paper, the problem of activity recognition is analyzed through
the segmentation of the multidimensional time series of the acceleration data
measured in the 3-d space using body-worn accelerometers. The proposed model
for automatic temporal segmentation is a specific statistical latent process
model which assumes that the observed acceleration sequence is governed by
sequence of hidden (unobserved) activities. More specifically, the proposed
approach is based on a specific multiple regression model incorporating a
hidden discrete logistic process which governs the switching from one activity
to another over time. The model is learned in an unsupervised context by
maximizing the observed-data log-likelihood via a dedicated
expectation-maximization (EM) algorithm. We applied it on a real-world
automatic human activity recognition problem and its performance was assessed
by performing comparisons with alternative approaches, including well-known
supervised static classifiers and the standard hidden Markov model (HMM). The
obtained results are very encouraging and show that the proposed approach is
quite competitive even it works in an entirely unsupervised way and does not
requires a feature extraction preprocessing step
- …