475 research outputs found

    Investigation of Heterogeneous Approach to Fact Invention of Web Users’ Web Access Behaviour

    Get PDF
    World Wide Web consists of a huge volume of different types of data. Web mining is one of the fields of data mining wherein there are different web services and a large number of web users. Web user mining is also one of the fields of web mining. The web users’ information about the web access is collected through different ways. The most common technique to collect information about the web users is through web log file. There are several other techniques available to collect web users’ web access information; they are through browser agent, user authentication, web review, web rating, web ranking and tracking cookies. The web users find it difficult to retrieve their required information in time from the web because of the huge volume of unstructured and structured information which increases the complexity of the web. Web usage mining is very much important for various purposes such as organizing website, business and maintenance service, personalization of website and reducing the network bandwidth. This paper provides an analysis about the web usage mining techniques. Â

    Knowledge Discovery from Web Logs - A Survey

    Get PDF
    Web usage mining is obtaining the interesting and constructive knowledge and implicit information from activities related to the WWW. Web servers trace and gather information about user interactions every time the user requests for particular resources. Evaluating the Web access logs would assist in predicting the user behavior and also assists in formulating the web structure. Based on the applications point of view, information extracted from the Web usage patterns possibly directly applied to competently manage activities related to e-business, e-services, e-education, on-line communities and so on. On the other hand, since the size and density of the data grows rapidly, the information provided by existing Web log file analysis tools may possibly provide insufficient information and hence more intelligent mining techniques are needed. There are several approaches previously available for web usage mining. The approaches available in the literature have their own merits and demerits. This paper focuses on the study and analysis of various existing web usage mining techniques

    Development of Context-Aware Recommenders of Sequences of Touristic Activities

    Get PDF
    En els últims anys, els sistemes de recomanació s'han fet omnipresents a la xarxa. Molts serveis web, inclosa la transmissió de pel·lícules, la cerca web i el comerç electrònic, utilitzen sistemes de recomanació per facilitar la presa de decisions. El turisme és una indústria molt representada a la xarxa. Hi ha diversos serveis web (e.g. TripAdvisor, Yelp) que es beneficien de la integració de sistemes recomanadors per ajudar els turistes a explorar destinacions turístiques. Això ha augmentat la investigació centrada en la millora dels recomanadors turístics per resoldre els principals problemes als quals s'enfronten. Aquesta tesi proposa nous algorismes per a sistemes recomanadors turístics que aprenen les preferències dels turistes a partir dels seus missatges a les xarxes socials per suggerir una seqüència d'activitats turístiques que s'ajustin a diversos contextes i incloguin activitats afins. Per aconseguir-ho, proposem mètodes per identificar els turistes a partir de les seves publicacions a Twitter, identificant les activitats experimentades en aquestes publicacions i perfilant turistes similars en funció dels seus interessos, informació contextual i períodes d'activitat. Aleshores, els perfils d'usuari es combinen amb un algorisme de mineria de regles d'associació per capturar relacions implícites entre els punts d'interès de cada perfil. Finalment, es fa un rànquing de regles i un procés de selecció d'un conjunt d'activitats recomanables. Es va avaluar la precisió de les recomanacions i l'efecte del perfil d'usuari. A més, ordenem el conjunt d'activitats mitjançant un algorisme multi-objectiu per enriquir l'experiència turística. També realitzem una segona fase d'anàlisi dels fluxos turístics a les destinacions que és beneficiós per a les organitzacions de gestió de destinacions, que volen entendre la mobilitat turística. En general, els mètodes i algorismes proposats en aquesta tesi es mostren útils en diversos aspectes dels sistemes de recomanació turística.En los últimos años, los sistemas de recomendación se han vuelto omnipresentes en la web. Muchos servicios web, incluida la transmisión de películas, la búsqueda en la web y el comercio electrónico, utilizan sistemas de recomendación para ayudar a la toma de decisiones. El turismo es una industria altament representada en la web. Hay varios servicios web (e.g. TripAdvisor, Yelp) que se benefician de la inclusión de sistemas recomendadores para ayudar a los turistas a explorar destinos turísticos. Esto ha aumentado la investigación centrada en mejorar los recomendadores turísticos y resolver los principales problemas a los que se enfrentan. Esta tesis propone nuevos algoritmos para sistemas recomendadores turísticos que aprenden las preferencias de los turistas a partir de sus mensajes en redes sociales para sugerir una secuencia de actividades turísticas que se alinean con diversos contextos e incluyen actividades afines. Para lograr esto, proponemos métodos para identificar a los turistas a partir de sus publicaciones en Twitter, identificar las actividades experimentadas en estas publicaciones y perfilar turistas similares en función de sus intereses, contexto información y periodos de actividad. Luego, los perfiles de usuario se combinan con un algoritmo de minería de reglas de asociación para capturar relaciones entre los puntos de interés que aparecen en cada perfil. Finalmente, un proceso de clasificación de reglas y selección de actividades produce un conjunto de actividades recomendables. Se evaluó la precisión de las recomendaciones y el efecto de la elaboración de perfiles de usuario. Ordenamos además el conjunto de actividades utilizando un algoritmo multi-objetivo para enriquecer la experiencia turística. También llevamos a cabo un análisis de los flujos turísticos en los destinos, lo que es beneficioso para las organizaciones de gestión de destinos, que buscan entender la movilidad turística. En general, los métodos y algoritmos propuestos en esta tesis se muestran útiles en varios aspectos de los sistemas de recomendación turística.In recent years, recommender systems have become ubiquitous on the web. Many web services, including movie streaming, web search and e-commerce, use recommender systems to aid human decision-making. Tourism is one industry that is highly represented on the web. There are several web services (e.g. TripAdvisor, Yelp) that benefit from integrating recommender systems to aid tourists in exploring tourism destinations. This has increased research focused on improving tourism recommender systems and solving the main issues they face. This thesis proposes new algorithms for tourism recommender systems that learn tourist preferences from their social media data to suggest a sequence of touristic activities that align with various contexts and include affine activities. To accomplish this, we propose methods for identifying tourists from their frequent Twitter posts, identifying the activities experienced in these posts, and profiling similar tourists based on their interests, contextual information, and activity periods. User profiles are then combined with an association rule mining algorithm for capturing implicit relationships between points of interest apparent in each profile. Finally, a rule ranking and activity selection process produces a set of recommendable activities. The recommendations were evaluated for accuracy and the effect of user profiling. We further order the set of activities using a multi-objective algorithm to enrich the tourist experience. We also carry out a second-stage analysis of tourist flows at destinations which is beneficial to destination management organisations seeking to understand tourist mobility. Overall, the methods and algorithms proposed in this thesis are shown to be useful in various aspects of tourism recommender systems

    Reverse Engineering Static Content and Dynamic Behaviour of E-Commerce Websites for Fun and Profit

    Get PDF
    Atualmente os websites de comércio eletrónico são uma das ferramentas principais para a realização de transações entre comerciantes online e consumidores ou empresas. Estes websites apoiam- se fortemente na sumarização e análise dos hábitos de navegação dos consumidores, de forma a influenciar as suas ações no website com o intuito de otimizar métricas de sucesso como o CTR (Click through Rate), CPC (Cost per Conversion), Basket e Lifetime Value e User Engagement. A utilização de técnicas de data mining e machine learning na extração de conhecimento a partir dos conjuntos de dados existentes nos websites de comércio eletrónico tem vindo a ter uma crescente influência nas campanhas de marketing realizadas na Internet.Quando o provedor de serviços de machine learning se deparada com um novo website de comércio eletrónico, inicia um processo de web mining, fazendo recolha de dados, tanto históricos como em tempo real, do website e analisando/transformando estes dados de forma a tornar os mesmos utilizáveis para fins de extração de informação tanto sobre a estrutura e conteúdo de um website assim como dos hábitos de navegação dos seus utilizadores típicos. Apenas após este processo é que os data scientists são capazes de desenvolver modelos relevantes e algoritmos para melhorar e otimizar as atividades de marketing online.Este processo é, na sua generalidade, moroso em tempo e recursos, dependendo sempre da condição em que os dados são apresentados ao data scientist. Dados com mais qualidade (p.ex. dados completos), facilitam o trabalho dos data scientists e tornam o mesmo mais rápido. Por outro lado, na generalidade dos casos, os data scientists tem de recorrer a técnicas de monitorização de eventos específicos ao domínio do website de forma a atingir o objetivo de conhecer os hábitos dos utlizadores, tornando-se necessário a realização de modificações ao código fonte do website para a captura desses mesmos eventos, aumentando assim o risco de não capturar toda a informação relevante por não ativar os mecanismos de monitorização em todas as páginas do web- site. Por exemplo, podemos não ter conhecimento a priori que uma visita à página de Condições de Entrega é relevante para prever o desejo de um dado consumidor efetuar uma compra e, desta forma, os mecanismos de monitorização nessas páginas podem não ser ativados.No contexto desta problemática, a solução proposta consiste numa metodologia capaz de ex- trair e combinar a informação sobre um dado website de comércio eletrónico através de um pro- cesso de web mining, compreendendo a estrutura de páginas de um website, assim como do conteúdo das mesmas, baseando-se para isso na identificação de conteúdo dinâmico das páginas assim como informação semântica recolhida de locais predefinidos. Adicionalmente esta informação é complementada, usando dados presente nos registos de acesso de utilizadores, com modelos preditivos do futuro comportamento dos utilizadores no website. Torna-se assim possível a apresentação de um modelo de dados representando a informação sobre um dado website de comércio eletrónico e os seus utilizadores arquetípicos, podendo posteriormente estes dados serem utiliza- dos, por exemplo, em sistemas de simulação.Nowadays electronic commerce websites are one of the main transaction tools between on-line merchants and consumers or businesses. These e-commerce websites rely heavily on summarizing and analyzing the behavior of customers, making an effort to influence user actions towards the optimization of success metrics such as CTR (Click through Rate), CPC (Cost per Conversion), Basket and Lifetime Value and User Engagement. Knowledge extraction from the existing e- commerce websites datasets, using data mining and machine learning techniques, has been greatly influencing the Internet marketing activities.When faced with a new e-commerce website, the machine learning practitioner starts a web mining process by collecting historical and real-time data of the website and analyzing/transforming this data in order to be capable of extracting information about the website structure and content and its users' behavior. Only after this process the data scientists are able to build relevant models and algorithms to enhance marketing activities.This is an expensive process in resources and time since it will always depend on the condition in which the data is presented to the data scientist, since data with more quality (i.e. no incomplete data) will make the data scientist work easier and faster. On the other hand, in most of the cases, data scientists would usually resort to tracking domain-specific events throughout a user's visit to the website in order to fulfill the objective of discovering the users' behavior and, for this, it is necessary to perform code modifications to the pages themselves, that will result in a larger risk of not capturing all the relevant information by not enabling tracking mechanisms in certain pages. For example, we may not know a priori that a visit to a Delivery Conditions page is relevant to the prediction of a user's willingness to buy and therefore would not enable tracking on those pages.Within this problem context, the proposed solution consists in a methodology capable of extracting and combining information about a e-commerce website through a process of web mining, comprehending the structure as well as the content of the website pages, relying mostly on identifying dynamic content and semantic information in predefined locations, complemented with the capability of, using the user's access logs, extracting more accurate models to predict the users future behavior. This allows for the creation of a data model representing an e-commerce website and its archetypical users that can be useful, for example, in simulation systems

    Web usage mining for click fraud detection

    Get PDF
    Estágio realizado na AuditMark e orientado pelo Eng.º Pedro FortunaTese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Swarm intelligence for clustering dynamic data sets for web usage mining and personalization.

    Get PDF
    Swarm Intelligence (SI) techniques were inspired by bee swarms, ant colonies, and most recently, bird flocks. Flock-based Swarm Intelligence (FSI) has several unique features, namely decentralized control, collaborative learning, high exploration ability, and inspiration from dynamic social behavior. Thus FSI offers a natural choice for modeling dynamic social data and solving problems in such domains. One particular case of dynamic social data is online/web usage data which is rich in information about user activities, interests and choices. This natural analogy between SI and social behavior is the main motivation for the topic of investigation in this dissertation, with a focus on Flock based systems which have not been well investigated for this purpose. More specifically, we investigate the use of flock-based SI to solve two related and challenging problems by developing algorithms that form critical building blocks of intelligent personalized websites, namely, (i) providing a better understanding of the online users and their activities or interests, for example using clustering techniques that can discover the groups that are hidden within the data; and (ii) reducing information overload by providing guidance to the users on websites and services, typically by using web personalization techniques, such as recommender systems. Recommender systems aim to recommend items that will be potentially liked by a user. To support a better understanding of the online user activities, we developed clustering algorithms that address two challenges of mining online usage data: the need for scalability to large data and the need to adapt cluster sing to dynamic data sets. To address the scalability challenge, we developed new clustering algorithms using a hybridization of traditional Flock-based clustering with faster K-Means based partitional clustering algorithms. We tested our algorithms on synthetic data, real VCI Machine Learning repository benchmark data, and a data set consisting of real Web user sessions. Having linear complexity with respect to the number of data records, the resulting algorithms are considerably faster than traditional Flock-based clustering (which has quadratic complexity). Moreover, our experiments demonstrate that scalability was gained without sacrificing quality. To address the challenge of adapting to dynamic data, we developed a dynamic clustering algorithm that can handle the following dynamic properties of online usage data: (1) New data records can be added at any time (example: a new user is added on the site); (2) Existing data records can be removed at any time. For example, an existing user of the site, who no longer subscribes to a service, or who is terminated because of violating policies; (3) New parts of existing records can arrive at any time or old parts of the existing data record can change. The user\u27s record can change as a result of additional activity such as purchasing new products, returning a product, rating new products, or modifying the existing rating of a product. We tested our dynamic clustering algorithm on synthetic dynamic data, and on a data set consisting of real online user ratings for movies. Our algorithm was shown to handle the dynamic nature of data without sacrificing quality compared to a traditional Flock-based clustering algorithm that is re-run from scratch with each change in the data. To support reducing online information overload, we developed a Flock-based recommender system to predict the interests of users, in particular focusing on collaborative filtering or social recommender systems. Our Flock-based recommender algorithm (FlockRecom) iteratively adjusts the position and speed of dynamic flocks of agents, such that each agent represents a user, on a visualization panel. Then it generates the top-n recommendations for a user based on the ratings of the users that are represented by its neighboring agents. Our recommendation system was tested on a real data set consisting of online user ratings for a set of jokes, and compared to traditional user-based Collaborative Filtering (CF). Our results demonstrated that our recommender system starts performing at the same level of quality as traditional CF, and then, with more iterations for exploration, surpasses CF\u27s recommendation quality, in terms of precision and recall. Another unique advantage of our recommendation system compared to traditional CF is its ability to generate more variety or diversity in the set of recommended items. Our contributions advance the state of the art in Flock-based 81 for clustering and making predictions in dynamic Web usage data, and therefore have an impact on improving the quality of online services

    Data Mining Applications On Web Usage Analysis & User Profiling

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2003Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2003Tez çalışmasında veri madenciliği teknolojisi, fonksiyonları ve uygulamaları özetlenmiştir. OLAP teknolojilerine ve veri ambarlarına da veri madenciliğinin anahtar kavramları olarak değinilmiştir. Uygulama kısmında müşteri ve alışveriş kalıpları analizi için bir internet parakendecisinin işlemsel verileri kullanılmıştır. Müşteri segmentasyonu ve kullanıcı betimleme gibi konulardaki kurumsal kararları desteklemek amacıyla veri içerisindeki kalıplar çıkarılmaya çalışılmıştır.This thesis gives a summary of data mining technology, its functionalities and applications. OLAP technology and data warehouses are also introduced as the key concepts in data mining. The usage of data mining on the internet and the decisions based on internet usage data are introduced. In the application section a web retailer’s transactional data is used for analyzing customer and shopping patterns.Hidden patterns within the data are tried to be extracted in order to support business decisions such as user profiling and customer segmentation.Yüksek LisansM.Sc