thesis

Integrating Network Analysis and Data Mining Techniques into Effective Framework for Web Mining and Recommendation. A Framework for Web Mining and Recommendation

Abstract

The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5

    Similar works