1,242 research outputs found
Automated user modeling for personalized digital libraries
Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to
improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in
an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information
IMPUTING OR SMOOTHING? MODELLING THE MISSING ONLINE CUSTOMER JOURNEY TRANSITIONS FOR PURCHASE PREDICTION
Online customer journeys are at the core of e-commerce systems and it is therefore important to model and understand this online customer behaviour. Clickstream data from online journeys can be modelled using Markov Chains. This study investigates two different approaches to handle missing transition probabilities in constructing Markov Chain models for purchase prediction. Imputing the transition probabilities by using Chapman-Kolmogorov (CK) equation addresses this issue and achieves high prediction accuracy by approximating them with one step ahead probability. However, it comes with the problem of a high computational burden and some probabilities remaining zero after imputation. An alternative approach is to smooth the transition probabilities using Bayesian techniques. This ensures non-zero probabilities but this approach has been criticized for not being as accurate as the CK method, though this has not been fully evaluated in the literature using realistic, commercial data. We compare the accuracy of the purchase prediction of the CK and Bayesian methods, and evaluate them based on commercial web server data from a major European airline
Clickstream Data Analysis: A Clustering Approach Based on Mixture Hidden Markov Models
Nowadays, the availability of devices such as laptops and cell phones enables one to
browse the web at any time and place. As a consequence, a company needs to have a
website so as to maintain or increase customer loyalty and reach potential new customers.
Besides, acting as a virtual point-of-sale, the company portal allows it to obtain insights on
potential customers through clickstream data, web generated data that track users accesses
and activities in websites. However, these data are not easy to handle as they are complex,
unstructured and limited by lack of clear information about user intentions and goals.
Clickstream data analysis is a suitable tool for managing the complexity of these datasets,
obtaining a cleaned and processed sequential dataframe ready to identify and analyse
patterns.
Analysing clickstream data is important for companies as it enables them to under stand differences in web user behaviour while they explore websites, how they move
from one page to another and what they select in order to define business strategies tar geting specific types of potential costumers. To obtain this level of insight it is pivotal to
understand how to exploit hidden information related to clickstream data.
This work presents the cleaning and pre-processing procedures for clickstream data
which are needed to get a structured sequential dataset and analyses these sequences by
the application of Mixture of discrete time Hidden Markov Models (MHMMs), a statisti cal tool suitable for clickstream data analysis and profile identification that has not been
widely used in this context. Specifically, hidden Markov process accounts for a time varying latent variable to handle uncertainty and groups together observed states based
on unknown similarity and entails identifying both the number of mixture components re lating to the subpopulations as well as the number of latent states for each latent Markov
chain.
However, the application of MHMMs requires the identification of both the number
of components and states. Information Criteria (IC) are generally used for model selection in mixture hidden Markov models and, although their performance has been widely
studied for mixture models and hidden Markov models, they have received little attention
in the MHMM context. The most widely used criterion is BIC even if its performance for
these models depends on factors such as the number of components and sequence length.
Another class of model selection criteria is the Classification Criteria (CC). They were
defined specifically for clustering purposes and rely on an entropy measure to account for
separability between groups. These criteria are clearly the best option for our purpose, but
their application as model selection tools for MHMMs requires the definition of a suitable
entropy measure.
In the light of these considerations, this work proposes a classification criterion based
on an integrated classification likelihood approach for MHMMs that accounts for the two
latent classes in the model: the subpopulations and the hidden states. This criterion is
a modified ICL BIC, a classification criterion that was originally defined in the mixture
model context and used in hidden Markov models. ICL BIC is a suitable score to identify
the number of classes (components or states) and, thus, to extend it to MHMMs we de fined a joint entropy accounting for both a component-related entropy and a state-related
conditional entropy.
The thesis presents a Monte Carlo simulation study to compare selection criteria per formance, the results of which point out the limitations of the most commonly used infor mation criteria and demonstrate that the proposed criterion outperforms them in identify ing components and states, especially in short length sequences which are quite common
in website accesses. The proposed selection criterion was applied to real clickstream data
collected from the website of a Sicilian company operating in the hospitality sector. Data
was modelled by an MHMM identifying clusters related to the browsing behaviour of
web users which provided essential indications for developing new business strategies.
This thesis is structured as follows: after an introduction on the main topics in Chapter
1, we present the clickstream data and their cleaning and pre-processing steps in Chapter
2; Chapter 3 illustrates the structure and estimation algorithms of mixture hidden Markov
models; Chapter 4 presents a review of model selection criteria and the definition of the
proposed ICL BIC for MHMMs; the real clickstream data analysis follows in Chapter 5
A hybrid model for business process event and outcome prediction
Large service companies run complex customer service processes to provide communication services to their customers. The flawless execution of these processes is essential because customer service is an important differentiator. They must also be able to predict if processes will complete successfully or run into exceptions in order to intervene at the right time, preempt problems and maintain customer service. Business process data are sequential in nature and can be very diverse. Thus, there is a need for an efficient sequential forecasting methodology that can cope with this diversity. This paper proposes two approaches, a sequential k nearest neighbour and an extension of Markov models both with an added component based on sequence alignment. The proposed approaches exploit temporal categorical features of the data to predict the process next steps using higher order Markov models and the process outcomes using sequence alignment technique. The diversity aspect of the data is also added by considering subsets of similar process sequences based on k nearest neighbours. We have shown, via a set of experiments, that our sequential k nearest neighbour offers better results when compared with the original ones; our extension Markov model outperforms random guess, Markov models and hidden Markov models
Evaluation, Analysis and adaptation of web prefetching techniques in current web
Abstract
This dissertation is focused on the study of the prefetching technique applied to the World Wide Web. This technique lies in processing (e.g., downloading) a Web request before the user actually makes it. By doing so, the waiting time perceived by the user can be reduced, which is the main goal of the Web prefetching techniques.
The study of the state of the art about Web prefetching showed the heterogeneity that exists in its performance evaluation.
This heterogeneity is mainly focused on four issues:
i) there was no open framework to simulate and evaluate the already proposed prefetching techniques;
ii) no uniform selection of the performance indexes to be maximized, or even their definition;
iii) no comparative studies of prediction algorithms taking into account the costs and benefits of web prefetching at the same time;
and iv) the evaluation of techniques under very different or few significant workloads.
During the research work, we have contributed to homogenizing the evaluation of prefetching performance by developing an open simulation framework that reproduces in detail all the aspects that impact on prefetching performance. In addition, prefetching performance metrics have been analyzed in order to clarify their definition and detect the most meaningful from the user's point of view.
We also proposed an evaluation methodology to consider the cost and the benefit of prefetching at the same time.
Finally, the importance of using current workloads to evaluate prefetching techniques has been highlighted; otherwise wrong conclusions could be achieved.
The potential benefits of each web prefetching architecture were analyzed, finding that collaborative predictors could reduce almost all the latency perceived by users.
The first step to develop a collaborative predictor is to make predictions at the server, so this thesis is focused on an architecture with a server-located predictor.
The environment conditions that can be found in the web are alsDoménech I De Soria, J. (2007). Evaluation, Analysis and adaptation of web prefetching techniques in current web [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1841Palanci
On the classification and evaluation of prefetching schemes
Abstract available: p. [2
Predicting Sequences of Traversed Nodes in Graphs using Network Models with Multiple Higher Orders
We propose a novel sequence prediction method for sequential data capturing
node traversals in graphs. Our method builds on a statistical modelling
framework that combines multiple higher-order network models into a single
multi-order model. We develop a technique to fit such multi-order models in
empirical sequential data and to select the optimal maximum order. Our
framework facilitates both next-element and full sequence prediction given a
sequence-prefix of any length. We evaluate our model based on six empirical
data sets containing sequences from website navigation as well as public
transport systems. The results show that our method out-performs
state-of-the-art algorithms for next-element prediction. We further demonstrate
the accuracy of our method during out-of-sample sequence prediction and
validate that our method can scale to data sets with millions of sequences.Comment: 18 pages, 5 figures, 2 table
- …