58 research outputs found

    Generating dynamic higher-order Markov models in web usage mining

    Get PDF
    Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter

    Data Mining for Browsing Patterns in Weblog Data by Art Neural Networks

    Get PDF
    Categorising visitors based on their interaction with a website is a key problem in Web content usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customised content. This paper proposes an approach to clustering weblog data, based on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed approach can be used for unsupervised and self-learning data mining, which makes it adaptable to dynamically changing websites

    Implicit Measures of Lostness and Success in Web Navigation

    Get PDF
    In two studies, we investigated the ability of a variety of structural and temporal measures computed from a web navigation path to predict lostness and task success. The user’s task was to find requested target information on specified websites. The web navigation measures were based on counts of visits to web pages and other statistical properties of the web usage graph (such as compactness, stratum, and similarity to the optimal path). Subjective lostness was best predicted by similarity to the optimal path and time on task. The best overall predictor of success on individual tasks was similarity to the optimal path, but other predictors were sometimes superior depending on the particular web navigation task. These measures can be used to diagnose user navigational problems and to help identify problems in website design

    Web-log mining for predictive web caching

    Full text link

    Log Pre-Processing and Grammatical Inference for Web Usage Mining

    Get PDF
    International audienceIn this paper, we propose a Web Usage Mining pre-processing method to retrieve missing data from the server log files. Moreover, we propose two levels of evaluation: directly on reconstructed data, but also after a machine learning step by evaluating inferred grammatical models. We conducted some experiments and we showed that our algorithm improves the quality of user data

    IMPUTING OR SMOOTHING? MODELLING THE MISSING ONLINE CUSTOMER JOURNEY TRANSITIONS FOR PURCHASE PREDICTION

    Get PDF
    Online customer journeys are at the core of e-commerce systems and it is therefore important to model and understand this online customer behaviour. Clickstream data from online journeys can be modelled using Markov Chains. This study investigates two different approaches to handle missing transition probabilities in constructing Markov Chain models for purchase prediction. Imputing the transition probabilities by using Chapman-Kolmogorov (CK) equation addresses this issue and achieves high prediction accuracy by approximating them with one step ahead probability. However, it comes with the problem of a high computational burden and some probabilities remaining zero after imputation. An alternative approach is to smooth the transition probabilities using Bayesian techniques. This ensures non-zero probabilities but this approach has been criticized for not being as accurate as the CK method, though this has not been fully evaluated in the literature using realistic, commercial data. We compare the accuracy of the purchase prediction of the CK and Bayesian methods, and evaluate them based on commercial web server data from a major European airline

    An average linear time algorithm for web data mining

    Get PDF
    In this paper, we study the complexity of a data mining algorithm for extracting patterns from user web navigation data that was proposed in previous work.3 The user web navigation sessions are inferred from log data and modeled as a Markov chain. The chain's higher probability trails correspond to the preferred trails on the web site. The algorithm implements a depth-first search that scans the Markov chain for the high probability trails. We show that the average behaviour of the algorithm is linear time in the number of web pages accessed

    Cost and Response Time Simulation for Web-based Applications on Mobile Channels

    Get PDF
    When considering the addition of a mobile presentation channel to an existing web-based application, a key question that has to be answered even before development begins is how the mobile channel’s characteristics will impact the user experience and the cost of using the application. If either of these factors is outside acceptable limits, economical considerations may forbid adding the channels, even if it would be feasible from a purely technical perspective. Both of these factors depend considerably on two metrics: The time required to transmit data over the mobile network, and the volume transmitted. The PETTICOAT method presented in this paper uses the dialog flow model and web server log files of an existing application to identify typical interaction sequences and to compile volume statistics, which are then run through a tool that simulates the volume and time that would be incurred by executing the interaction sequences on a mobile channel. From the simulated volume and time data, we can then calculate the cost of accessing the application on a mobile channel

    Automatic tracking and control for web recommendation New approaches for web recommendation

    Get PDF
    International audienceRecommender systems provide users with pertinent resources according to their context and their profiles, by applying statistical and knowledge discovery techniques. This paper describes a new approach of generating suitable recommendations based on the active user's navigation stream, by considering long distance resources in the history. Our main idea to solve this problem is the following: we consider that users browsing web pages or web contents can be seen as objects moving along trajectories in the web space. Having this assumption, we derive the appropriate description of the so-called recommender space to propose a mathematical model describing the behavior of the users/targets in the web/along the trajectories inside the recommender space. The second main assumption can then be expressed as follow: if we are able to track the users/targets along their trajectories, we are able to predict the future positions in the sub-spaces of the recommender space i.e., we are able to derive a new method for web recommendation and behavior monitoring. To achieve these objectives, we use the theory of the dynamic state estimation and more specifically the theory of Kalman filtering. We establish the appropriate model of the target tracker and we derive the iterative formulation of the filter. Then, we propose a new recommender system formulated as a control loop. We validate our approach on data extracted from online video consumption and we derive a users monitoring approach. Conclusions and perspectives are derived from the analysis of the obtained results and focus on the formulation of a topology of the recommender space
    • …
    corecore