6 research outputs found

    Web usage Mining: Web user Session Construction using Map-Reduce

    Get PDF
    Web Usage Mining deals with the understanding of user behavior while interacting with the website by using various log files The whole process of Web Usage Mining gets completed in three phases namely Data Preprocessing Pattern Discovery and Pattern Analysis Data Preprocessing is important because it takes 80 of the time of the whole process of Web Usage Mining Data Preprocessing involves Data Cleaning User Identification and Session Identification In Session Identification we find out the set of pages visited by a user within the duration of one particular visit to a website also called as Sessionization In paper 1 we proposed a new method for session construction As the size of log files are very large so there is a requirement of an approach for Session Identification by which processing time of our proposed method will be reduced to a great extent In this paper we used Map-reduce method to calculate sessions in which we combine both time and user navigation method This approach is faster than the existing approach because we have performed the whole process in distributed environmen

    Web Usage Mining:A Novel Approach for Web User Session Construction

    Get PDF
    The growth of World Wide Web is incredible as it can be seen in present days. Web usage mining plays an important role in the personalization of Web services, adaptation of Web sites, and the improvement of Web server performance. It applies data mining techniques to discover Web access patterns from Web log data. In order to discover access patterns, Web log data should be reconstructed into sessions. This paper provides a novel approach for session identification

    Linear Time Algorithms for Finding Maximal Forward References

    No full text
    A maximal forward reference of a Web user is a longest sequence of Web pages visited by the user without revisiting some previously visited page in the sequence. Ecient identi cation of such references from very large web logs is a fundamental and necessary data preparation task for the success of path pattern mining, an active research area in Web mining. The best known algorithm for this task is a sorting-based algorithm with sublinear time complexity [4]. In this paper we present two algorithms for nding those references, one is designed for interval sessions of user accesses and the other for gap sessions. We show that the two algorithms have linear (hence optimal) time complexity. We conduct empirical performance analysis and show that both algorithms need just several seconds more than the baseline time, i.e., the time needed for reading the Web log once sequentially from disk to RAM, testing whether each user access record is valid or not, and writing each valid user access record back to disk. The empirical performance analysis also shows that both algorithms are substantially faster than the sorting based algorithm. Finally, both algorithms are extended to the case of distributed Web logs
    corecore