    Business Intelligence from Web Usage Mining

    The rapid e-commerce growth has made both business community and customers face a new situation. Due to intense competition on one hand and the customer's option to choose from several alternatives business community has realized the necessity of intelligent marketing strategies and relationship management. Web usage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Web usage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization, network traffic flow analysis and so on. In this paper, we present the important concepts of Web usage mining and its various practical applications. We further present a novel approach 'intelligent-miner' (i-Miner) to optimize the concurrent architecture of a fuzzy clustering algorithm (to discover web data clusters) and a fuzzy inference system to analyze the Web site visitor trends. A hybrid evolutionary fuzzy clustering algorithm is proposed in this paper to optimally segregate similar user interests. The clustered data is then used to analyze the trends using a Takagi-Sugeno fuzzy inference system learned using a combination of evolutionary algorithm and neural network learning. Proposed approach is compared with self-organizing maps (to discover patterns) and several function approximation techniques like neural networks, linear genetic programming and Takagi-Sugeno fuzzy inference system (to analyze the clusters). The results are graphically illustrated and the practical significance is discussed in detail. Empirical results clearly show that the proposed Web usage-mining framework is efficient

    Intelligent Web Recommender System Based on Semantic Enhanced Approach

    Today’sworld the growth of the Web has created a big challenge for directing the user to the web pages in their areas of interest. This paper has presented a new method for better web page recommendation through semantic enhancement by integrating the domain and Web usage knowledge of a website. There are three different models are used, first model is ontology based model, second model is semantic network model and third model is Conceptual prediction model which is used for automatically generate a semantic network of the semantic Web usage knowledge

    Web Page Annotation Using Web Usage Mining and Domain Knowledge Ontology

    Today’s world the growth of the WWW has increased tremendously, the user is totally relying on web for information. Search engine provides the result pages to the user but all are not relevant so the challenging task is extracting the pages from web and provide to the user. WUM is an approach to extract knowledge and use it to the different purposes. In this paper new semantic approach is proposed based on WUM and Domain Knowledge Ontology. Ontology database preparation, it is also challenging task in this project

    Alternative approach to tree-structured web log representation and mining

    More recent approaches to web log data representation aim to capture the user navigational patterns with respect to the overall structure of the web site. One such representation is tree-structured log files which is the focus of this work. Most existing methods for analyzing such data are based on the use of frequent subtree mining techniques to extract frequent user activity and navigational paths. In this paper we evaluate the use of other standard data mining techniques enabled by a recently proposed structure preserving flat data representation for tree-structured data. The initially proposed framework was adjusted to better suit the web log mining task. Experimental evaluation is performed on two real world web log datasets and comparisons are made with an existing state-of-the art classifier for tree-structured data. The results show the great potential of the method in enabling the application of a wider range of data mining/analysis techniques to tree-structured web log data

    Optimizing E-Management Using Web Data Mining

    Today, one of the biggest challenges that E-management systems face is the explosive growth of operating data and to use this data to enhance services. Web usage mining has emerged as an important technique to provide useful management information from user's Web data. One of the areas where such information is needed is the Web-based academic digital libraries. A digital library (D-library) is an information resource system to store resources in digital format and provide access to users through the network. Academic libraries offer a huge amount of information resources, these information resources overwhelm students and makes it difficult for them to access to relevant information. Proposed solutions to alleviate this issue emphasize the need to build Web recommender systems that make it possible to offer each student with a list of resources that they would be interested in. Collaborative filtering is the most successful technique used to offer recommendations to users. Collaborative filtering provides recommendations according to the user relevance feedback that tells the system their preferences. Most recent work on D-library recommender systems uses explicit feedback. Explicit feedback requires students to rate resources which make the recommendation process not realistic because few students are willing to provide their interests explicitly. Thus, collaborative filtering suffers from “data sparsity” problem. In response to this problem, the study proposed a Web usage mining framework to alleviate the sparsity problem. The framework incorporates clustering mining technique and usage data in the recommendation process. Students perform different actions on D-library, in this study five different actions are identified, including printing, downloading, bookmarking, reading, and viewing the abstract. These actions provide the system with large quantities of implicit feedback data. The proposed framework also utilizes clustering data mining approach to reduce the sparsity problem. Furthermore, generating recommendations based on clusters produce better results because students belonging to the same cluster usually have similar interests. The proposed framework is divided into two main components: off-line and online components. The off-line component is comprised of two stages: data pre-processing and the derivation of student clusters. The online component is comprised of two stages: building student's profile and generating recommendations. The second stage consists of three steps, in the first step the target student profile is classified to the closest cluster profile using the cosine similarity measure. In the second phase, the Pearson correlation coefficient method is used to select the most similar students to the target student from the chosen cluster to serve as a source of prediction. Finally, a top-list of resources is presented. Using the Book-Crossing dataset the effectiveness of the proposed framework was evaluated based on sparsity level, and Mean Absolute Error (MAE) regarding accuracy. The proposed framework reduced the sparsity level between (0.07% and 26.71%) in the sub-matrices, whereas the sparsity level is between 99.79% and 78.81% using the proposed framework, and 99.86% (for the original matrix) before applying the proposed framework. The experimental results indicated that by using the proposed framework the performance is as much as 13.12% better than clustering-only explicit feedback data, and 21.14% better than the standard K Nearest Neighbours method. The overall results show that the proposed framework can alleviate the Sparsity problem resulting in improving the accuracy of the recommendations

    Selecção de planos de mineração de dados de utilização da web

    Tese de Doutoramento em Informática - Especialidade de Inteligência ArtificialA descoberta de conhecimento em dados de clickstream, relativos à interacção de indivíduos com sítios Web, está a assumir um papel, cada vez mais, preponderante, englobando uma audiência crescente de agentes de decisão ao longo da organização. A intenção subjacente reside em auxiliar as organizações a atingir as metas estabelecidas para os sítios que promovem e a maximizar as oportunidades emergentes da Web, explorando dados recolhidos, por inerência e de forma implícita, que, apesar de serem complexos e vastíssimos, constituem uma fonte extremamente rica e abrangente acerca do comportamento dos visitantes. No entanto, o desenvolvimento e a aplicação desses processos de mineração de dados são actividades que se revestem de grande complexidade, especialmente para utilizadores sem experiência e conhecimentos profundos neste domínio. Uma forma de combater este desafio consiste em proporcionar ferramentas consentâneas, capazes de assistirem os utilizadores na condução desses processos, procurando, deste modo, contribuir para a simplificação e acréscimo dos níveis de eficácia e de produtividade destas iniciativas. A estratégia defendida, para este efeito, desenrola-se em torno da gestão e reutilização, ao nível da organização, do conhecimento adquirido a partir da experiência prática, referente à resolução de problemas concretos que facultaram, no passado, processos bem sucedidos de mineração de dados de clickstream. O âmbito organizacional de tal estratégia visa, principalmente, fomentar um uso sinergético de recursos da organização, integrando os contributos de vários colaboradores e colocando as potencialidades deste tipo de mineração ao alcance e ao serviço de todos os seus membros, inclusive dos utilizadores mais inexperientes. O trabalho apresentado nesta dissertação descreve um sistema fundamentado no paradigma de raciocínio baseado em casos, o qual foi concebido com o propósito de assistir os utilizadores em duas formas primordiais: (i) captura, organização e armazenamento, num repositório de casos partilhado, do conhecimento acerca de exemplos úteis e bem sucedidos de processos de mineração de dados de clickstream; (ii) selecção dos planos de mineração alternativos e mais adequados, para solucionar um problema específico de análise de dados neste âmbito, dada uma descrição de alto nível desse mesmo problema. O sistema proposto foi implementado através de uma aplicação Web protótipo, a ser explorada ao nível da organização, consolidando o conhecimento respeitante a exemplos de exercícios de mineração de utilização da Web, numa base de casos centralizada. O sistema integra e retira benefícios de recursos relacionados da organização, suportando uma abordagem semi-automática de aquisição de conhecimento, a partir dos seguintes tipos de origens: fontes de dados da organização; documentos normalizados em formato PMML, produzidos por ferramentas de extracção de conhecimento e representativos de actividades de mineração concretizadas; informação complementar, obtida por meio de interacção com o utilizador. No apoio à resolução de problemas, o sistema actua a partir de um conjunto de requisitos da análise e de características dos dados de clickstream disponíveis, e, com base no conhecimento relativo à aplicação de métodos de mineração e de outras operações, sugere planos de mineração alternativos e apropriados para os dados em causa e para o fim a que a análise se destina. Tais planos são apresentados ao utilizador através de descrições gerais, acompanhadas por informação suplementar e por referências para detalhes explicativos da sua implementação pragmática.Discovering knowledge from clickstream data, related to the interaction of individuals with Web sites, is playing an increasingly important role, reaching a growing number of decision makers across the organization. The intention behind this is helping organizations to achieve the goals of the promoted sites and to maximize the latent opportunities of the Web, exploring data inherently and implicitly collected, which are huge and complex, yet a very rich and comprehensive source of visitants’ behavior insights. However, developing and applying such mining processes are very complex tasks, especially to users without deep knowledge and experience in this domain. One way to tackle this challenge is by building tools, capable of assisting users within such processes realization, in order to simplify these initiatives and to increase theirs efficacy and productivity levels. The defended strategy regarding such assistance relies on managing and reusing, at corporative level, the knowledge acquired from the practical experience in solving concrete problems, which had provided successful clickstream data mining processes in the past. This corporative-wide perspective mostly aims at favoring an synergetic use of the organization resources, bringing up together the contributions of distinct collaborators and making available the potentialities of this kind of mining to all members, including the inexperienced users. The work presented in this dissertation describes a system founded on the case based reasoning paradigm. This system was devised with the purpose of assisting users in two main ways: capturing, organizing and storing, on a shared case repository, the knowledge about successful and useful clickstream data mining processes; selecting the most suited and alternative mining plans, to solve a specific clickstream data analysis problem, given an high level description of such problem. The proposed system was implemented as a prototype Web-based application, to be explored at corporate level, consolidating the knowledge about Web usage mining processes examples on a centralized case base. The system integrates and takes advantage from related corporative resources, supporting a knowledge acquisition semi-automated approach from the following types of origins: corporative data sources; standard documents in PMML format, supplied by knowledge extraction tools and representing the mining activities accomplished; complementary information, obtained through user interaction. When advising problem solving, the system acts, taking the characteristics of the available clickstream data and the analysis requirements, and based on the acquired knowledge about applying data mining and other operations, suggests the most appropriate alternative mining plans to the data and the analysis at hands. The plans are deployed as overviews, complemented by additional information and by links to practical implementation details

    on the web, or web mining, has become an important research area. Web usage mining, which is the main topic of this paper, focuses on knowledge discovery from the clicks in the web log for a given site (the so-called click-stream), especially on analysis of sequences of clicks. Existing techniques for analyzing click sequences have different drawbacks, i.e., either huge storage requirements, excessive I/O cost, or scalability problems when additional information is introduced into the analysis