58 research outputs found
Generating dynamic higher-order Markov models in web usage mining
Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter
Data Mining for Browsing Patterns in Weblog Data by Art Neural Networks
Categorising visitors based on their interaction with a website is a key problem in Web content
usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which
may help in providing customised content. This paper proposes an approach to clustering weblog data, based
on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed
approach can be used for unsupervised and self-learning data mining, which makes it adaptable to
dynamically changing websites
Implicit Measures of Lostness and Success in Web Navigation
In two studies, we investigated the ability of a variety of structural and temporal measures computed from a web navigation path to predict lostness and task success. The user’s task was to find requested target information on specified websites. The web navigation measures were based on counts of visits to web pages and other statistical properties of the web usage graph (such as compactness, stratum, and similarity to the optimal path). Subjective lostness was best predicted by similarity to the optimal path and time on task. The best overall predictor of success on individual tasks was similarity to the optimal path, but other predictors were sometimes superior depending on the particular web navigation task. These measures can be used to diagnose user navigational problems and to help identify problems in website design
Log Pre-Processing and Grammatical Inference for Web Usage Mining
International audienceIn this paper, we propose a Web Usage Mining pre-processing method to retrieve missing data from the server log files. Moreover, we propose two levels of evaluation: directly on reconstructed data, but also after a machine learning step by evaluating inferred grammatical models. We conducted some experiments and we showed that our algorithm improves the quality of user data
IMPUTING OR SMOOTHING? MODELLING THE MISSING ONLINE CUSTOMER JOURNEY TRANSITIONS FOR PURCHASE PREDICTION
Online customer journeys are at the core of e-commerce systems and it is therefore important to model and understand this online customer behaviour. Clickstream data from online journeys can be modelled using Markov Chains. This study investigates two different approaches to handle missing transition probabilities in constructing Markov Chain models for purchase prediction. Imputing the transition probabilities by using Chapman-Kolmogorov (CK) equation addresses this issue and achieves high prediction accuracy by approximating them with one step ahead probability. However, it comes with the problem of a high computational burden and some probabilities remaining zero after imputation. An alternative approach is to smooth the transition probabilities using Bayesian techniques. This ensures non-zero probabilities but this approach has been criticized for not being as accurate as the CK method, though this has not been fully evaluated in the literature using realistic, commercial data. We compare the accuracy of the purchase prediction of the CK and Bayesian methods, and evaluate them based on commercial web server data from a major European airline
An average linear time algorithm for web data mining
In this paper, we study the complexity of a data mining algorithm for extracting patterns from user web navigation data that was proposed in previous work.3 The user web navigation sessions are inferred from log data and modeled as a Markov chain. The chain's higher probability trails correspond to the preferred trails on the web site. The algorithm implements a depth-first search that scans the Markov chain for the high probability trails. We show that the average behaviour of the algorithm is linear time in the number of web pages accessed
Cost and Response Time Simulation for Web-based Applications on Mobile Channels
When considering the addition of a mobile presentation channel to an existing web-based application, a key question that has to be answered even before development begins is how the mobile channel’s characteristics will impact the user experience and the cost of using the application. If either of these factors is outside acceptable limits, economical considerations may forbid adding the channels, even if it would be feasible from a purely technical perspective. Both of these factors depend considerably on two metrics: The time required to transmit data over the mobile network, and the volume transmitted.
The PETTICOAT method presented in this paper uses the dialog flow model and web server log files of an existing application to identify typical interaction sequences and to compile volume statistics, which are then run through a tool that simulates the volume and time that would be incurred by executing the interaction sequences on a mobile channel. From the simulated volume and time data, we can then
calculate the cost of accessing the application on a mobile channel
Automatic tracking and control for web recommendation New approaches for web recommendation
International audienceRecommender systems provide users with pertinent resources according to their context and their profiles, by applying statistical and knowledge discovery techniques. This paper describes a new approach of generating suitable recommendations based on the active user's navigation stream, by considering long distance resources in the history. Our main idea to solve this problem is the following: we consider that users browsing web pages or web contents can be seen as objects moving along trajectories in the web space. Having this assumption, we derive the appropriate description of the so-called recommender space to propose a mathematical model describing the behavior of the users/targets in the web/along the trajectories inside the recommender space. The second main assumption can then be expressed as follow: if we are able to track the users/targets along their trajectories, we are able to predict the future positions in the sub-spaces of the recommender space i.e., we are able to derive a new method for web recommendation and behavior monitoring. To achieve these objectives, we use the theory of the dynamic state estimation and more specifically the theory of Kalman filtering. We establish the appropriate model of the target tracker and we derive the iterative formulation of the filter. Then, we propose a new recommender system formulated as a control loop. We validate our approach on data extracted from online video consumption and we derive a users monitoring approach. Conclusions and perspectives are derived from the analysis of the obtained results and focus on the formulation of a topology of the recommender space
- …