3,819 research outputs found
Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching
Personalization in marketing aims at improving the shopping experience of
customers by tailoring services to individuals. In order to achieve this,
businesses must be able to make personalized predictions regarding the next
purchase. That is, one must forecast the exact list of items that will comprise
the next purchase, i.e., the so-called market basket. Despite its relevance to
firm operations, this problem has received surprisingly little attention in
prior research, largely due to its inherent complexity. In fact,
state-of-the-art approaches are limited to intuitive decision rules for pattern
extraction. However, the simplicity of the pre-coded rules impedes performance,
since decision rules operate in an autoregressive fashion: the rules can only
make inferences from past purchases of a single customer without taking into
account the knowledge transfer that takes place between customers. In contrast,
our research overcomes the limitations of pre-set rules by contributing a novel
predictor of market baskets from sequential purchase histories: our predictions
are based on similarity matching in order to identify similar purchase habits
among the complete shopping histories of all customers. Our contributions are
as follows: (1) We propose similarity matching based on subsequential dynamic
time warping (SDTW) as a novel predictor of market baskets. Thereby, we can
effectively identify cross-customer patterns. (2) We leverage the Wasserstein
distance for measuring the similarity among embedded purchase histories. (3) We
develop a fast approximation algorithm for computing a lower bound of the
Wasserstein distance in our setting. An extensive series of computational
experiments demonstrates the effectiveness of our approach. The accuracy of
identifying the exact market baskets based on state-of-the-art decision rules
from the literature is outperformed by a factor of 4.0.Comment: Accepted for oral presentation at 25th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD 2019
Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks
Session-based recommendations are highly relevant in many modern on-line
services (e.g. e-commerce, video streaming) and recommendation settings.
Recently, Recurrent Neural Networks have been shown to perform very well in
session-based settings. While in many session-based recommendation domains user
identifiers are hard to come by, there are also domains in which user profiles
are readily available. We propose a seamless way to personalize RNN models with
cross-session information transfer and devise a Hierarchical RNN model that
relays end evolves latent hidden states of the RNNs across user sessions.
Results on two industry datasets show large improvements over the session-only
RNNs
Recommender systems in industrial contexts
This thesis consists of four parts: - An analysis of the core functions and
the prerequisites for recommender systems in an industrial context: we identify
four core functions for recommendation systems: Help do Decide, Help to
Compare, Help to Explore, Help to Discover. The implementation of these
functions has implications for the choices at the heart of algorithmic
recommender systems. - A state of the art, which deals with the main techniques
used in automated recommendation system: the two most commonly used algorithmic
methods, the K-Nearest-Neighbor methods (KNN) and the fast factorization
methods are detailed. The state of the art presents also purely content-based
methods, hybridization techniques, and the classical performance metrics used
to evaluate the recommender systems. This state of the art then gives an
overview of several systems, both from academia and industry (Amazon, Google
...). - An analysis of the performances and implications of a recommendation
system developed during this thesis: this system, Reperio, is a hybrid
recommender engine using KNN methods. We study the performance of the KNN
methods, including the impact of similarity functions used. Then we study the
performance of the KNN method in critical uses cases in cold start situation. -
A methodology for analyzing the performance of recommender systems in
industrial context: this methodology assesses the added value of algorithmic
strategies and recommendation systems according to its core functions.Comment: version 3.30, May 201
A framework for automated anomaly detection in high frequency water-quality data from in situ sensors
River water-quality monitoring is increasingly conducted using automated in
situ sensors, enabling timelier identification of unexpected values. However,
anomalies caused by technical issues confound these data, while the volume and
velocity of data prevent manual detection. We present a framework for automated
anomaly detection in high-frequency water-quality data from in situ sensors,
using turbidity, conductivity and river level data. After identifying end-user
needs and defining anomalies, we ranked their importance and selected suitable
detection methods. High priority anomalies included sudden isolated spikes and
level shifts, most of which were classified correctly by regression-based
methods such as autoregressive integrated moving average models. However, using
other water-quality variables as covariates reduced performance due to complex
relationships among variables. Classification of drift and periods of
anomalously low or high variability improved when we applied replaced anomalous
measurements with forecasts, but this inflated false positive rates.
Feature-based methods also performed well on high priority anomalies, but were
also less proficient at detecting lower priority anomalies, resulting in high
false negative rates. Unlike regression-based methods, all feature-based
methods produced low false positive rates, but did not and require training or
optimization. Rule-based methods successfully detected impossible values and
missing observations. Thus, we recommend using a combination of methods to
improve anomaly detection performance, whilst minimizing false detection rates.
Furthermore, our framework emphasizes the importance of communication between
end-users and analysts for optimal outcomes with respect to both detection
performance and end-user needs. Our framework is applicable to other types of
high frequency time-series data and anomaly detection applications
- …