6 research outputs found
Web usage mining. Structuring semantically enriched clickstream data
Web servers worldwide generate a vast amount of information on web users ’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational) of the requests. The goal of this project is to analyse user behaviour by mining enriched web access log data. We discuss techniques and processes required for preparing, structuring and enriching web access logs. Furthermore we present several web usage mining methods for extracting useful features. Finally we employ all these techniques to cluster the users of the domain www.cs.vu.nl and to study their behaviours comprehensively. The contributions of this thesis are a data enrichment that is content and origin based and a treelike visualization of frequent navigational sequences. This visualization allows for an easily interpretable tree-like view of patterns with highlighted relevant information. The results of this project can be applied on diverse purposes, including marketing, web conten
Early Detection of User Exits from Clickstream Data: A Markov Modulated Marked Point Process Model
Most users leave e-commerce websites with no purchase. Hence, it
is important for website owners to detect users at risk of exiting
and intervene early (e. g., adapting website content or offering price
promotions). Prior approaches make widespread use of clickstream
data; however, state-of-the-art algorithms only model the sequence
of web pages visited and not the time spent on them.
In this paper, we develop a novel Markov modulated marked
point process (M3PP) model for detecting users at risk of exiting
with no purchase from clickstream data. It accommodates clickstream
data in a holistic manner: our proposed M3PP models both
the sequence of pages visited and the temporal dynamics between
them, i. e., the time spent on pages. This is achieved by a continuoustime
marked point process. Different from previous Markovian
clickstream models, our M3PP is the first model in which the continuous
nature of time is considered. The marked point process
is modulated by a continuous-time Markov process in order to
account for different latent shopping phases. As a secondary contribution,
we suggest a risk assessment framework. Rather than
predicting future page visits, we compute a user’s risk of exiting
with no purchase. For this purpose, we build upon sequential hypothesis
testing in order to suggest a risk score for user exits.
Our computational experiments draw upon real-world clickstream
data provided by a large online retailer. Based on this, we
find that state-of-the-art algorithms are consistently outperformed
by our M3PP model in terms of both AUROC (+6.24 percentage
points) and so-called time of early warning (+12.93 %). Accordingly,
our M3PP model allows for timely detections of user exits and thus
provides sufficient time for e-commerce website owners to trigger
dynamic online interventions