4,917 research outputs found

    Preprocessing and Content/Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development

    Get PDF
    From its appearance until nowadays, the internet saw a spectacular growth not only in terms of websites number and information volume, but also in terms of the number of visitors. Therefore, the need of an overall analysis regarding both the web sites and the content provided by them was required. Thus, a new branch of research was developed, namely web mining, that aims to discover useful information and knowledge, based not only on the analysis of websites and content, but also on the way in which the users interact with them. The aim of the present paper is to design a database that captures only the relevant data from logs in a way that will allow to store and manage large sets of temporal data with common tools in real time. In our work, we rely on different web sites or website sections with known architecture and we test several hypotheses from the literature in order to extend the framework to sites with unknown or chaotic structure, which are non-transparent in determining the type of visited pages. In doing this, we will start from non-proprietary, preexisting raw server logs.Knowledge Management, Web Mining, Data Preprocessing, Decision Trees, Databases

    Generating dynamic higher-order Markov models in web usage mining

    Get PDF
    Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter

    A new intelligent algorithm to create a profile for user based on web interactions

    Get PDF
    This paper presents a method to classify the web user’s navigation patterns automatically. The proposed model of this paper classifies user’s navigation patterns and predicts his/her upcoming requirements. To create users’ profile, a new method is introduced by recording user’s settings active and user’s similarity measurement with neighboring users. The proposed model is capable of creating the profile implicitly. Besides, it updates the profile based on created changes. In fact, we try to improve the function of recommender engine using user’s navigation patterns and clustering. The method is based on user’s navigation patterns and is able to present the result of recommender engine based on user’s requirement and interest. In addition, this method has the ability to help customize websites, more efficiently

    Towards trajectory anonymization: a generalization-based approach

    Get PDF
    Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques

    Web User Session Reconstruction Using Integer Programming

    Get PDF
    An important input for web usage mining is web user sessions that must be reconstructed from web logs (sessionization) when such sessions are not otherwise identified. We present a novel approach for sessionization based on an in- teger program. We compare results of our approach with the timeout heuristic on web logs from an academic web site. We find our integer program provides sessions that better match an expected empirical distribution with about half of the standard error of the heuristic.This work has been partially supported by the National Doctoral Grant from Conicyt Chile and by the Chilean Millennium Scientific Institute of Complex Engineering Systems
    corecore