Search CORE

802 research outputs found

Investigation of Heterogeneous Approach to Fact Invention of Web Usersâ€™ Web Access Behaviour

Author: Manohar E.
Punithavathani D.Shalini
Publication venue: 'CIRWOLRD'
Publication date: 15/12/2016
Field of study

World Wide Web consists of a huge volume of different types of data. Web mining is one of the fields of data mining wherein there are different web services and a large number of web users. Web user mining is also one of the fields of web mining. The web usersâ€™ information about the web access is collected through different ways. The most common technique to collect information about the web users is through web log file. There are several other techniques available to collect web usersâ€™ web access information; they are through browser agent, user authentication, web review, web rating, web ranking and tracking cookies. The web users find it difficult to retrieve their required information in time from the web because of the huge volume of unstructured and structured information which increases the complexity of the web. Web usage mining is very much important for various purposes such as organizing website, business and maintenance service, personalization of website and reducing the network bandwidth. This paper provides an analysis about the web usage mining techniques.Â Â

KHALSA PUBLICATIONS

Improved Pre-Processing Stages in Web Usage Mining Using Web Log

Author: J.Umarani1
S.Kavitha
Publication venue
Publication date
Field of study

Enormous growth in the web persists both in number of web sites and number of users. The growth generated large volume of data in during user’s interaction with the web site and recorded in web logs. Web site owners need to understand about their users by accessing these web logs. Web mining perks up to comprehend range of concepts of diverse fields. Web Usage Mining (WUM) is the recent research field that it corresponds to the process of Knowledge Discovery in Databases (KDD). It comprises three main categories: Pre-Processing, Pattern Analysis, Pattern Discovery. WUM extracts behavioral data from web users data and if possible from web site information (structure and content). In this paper, we propose a customized application specific methodology for preprocessing the Web logs and combining WUM with Association Rule Mining

ZENODO

DCU-TCD@LogCLEF 2010: re-ranking document collections and query performance estimation

Author: Ghorab M. Rami
Jones Gareth J.F.
Leveling Johannes
Magdy Walid
Wade Vincent
Publication venue
Publication date: 01/09/2010
Field of study

This paper describes the collaborative participation of Dublin City University and Trinity College Dublin in LogCLEF 2010. Two sets of experiments were conducted. First, different aspects of the TEL query logs were analysed after extracting user sessions of consecutive queries on a topic. The relation between the queries and their length (number of terms) and position (first query or further reformulations) was examined in a session with respect to query performance estimators such as query scope, IDF-based measures, simplified query clarity score, and average inverse document collection frequency. Results of this analysis suggest that only some estimator values show a correlation with query length or position in the TEL logs (e.g. similarity score between collection and query). Second, the relation between three attributes was investigated: the user's country (detected from IP address), the query language, and the interface language. The investigation aimed to explore the influence of the three attributes on the user's collection selection. Moreover, the investigation involved assigning different weights to the three attributes in a scoring function that was used to re-rank the collections displayed to the user according to the language and country. The results of the collection re-ranking show a significant improvement in Mean Average Precision (MAP) over the original collection ranking of TEL. The results also indicate that the query language and interface language have more in uence than the user's country on the collections selected by the users

CiteSeerX

Irish Universities

DCU Online Research Access Service

Learning user behaviours from website visit profiling

Author: Gutiérrez Torre Alberto
Publication venue: Universitat Politècnica de Catalunya
Publication date: 19/06/2014
Field of study

El proyecto consiste en el diseño e implementación de un programa que analiza,a través de los registros o logs, el tráfico y los usuarios de servidores web. En concreto el proyecto pone énfasis en la generación automática de modelos para poder analizar comportamientos de los usuarios

UPCommons. Portal del coneixement obert de la UPC

You, the Web and Your Device: Longitudinal Characterization of Browsing Habits

Author: Drago Idilio
Houidi Zied Ben
Lamali Mohamed Lamine
Mellia Marco
Vassio Luca
Publication venue
Publication date: 01/01/2018
Field of study

Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is however challenging. The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This paper presents a longitudinal study of clickstreams in from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices and the different roles of search engines and social networks in promoting content. Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies (anonymized clickstreams are available to the public at http://bigdata.polito.it/clickstream).Comment: 30 page

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Analysis of Clickstream Data

Author: JAMALZADEH MOHAMMADAMIN
Publication venue
Publication date: 01/01/2011
Field of study

This thesis is concerned with providing further statistical development in the area of web usage analysis to explore web browsing behaviour patterns. We received two data sources: web log files and operational data files for the websites, which contained information on online purchases. There are many research question regarding web browsing behaviour. Specifically, we focused on the depth-of-visit metric and implemented an exploratory analysis of this feature using clickstream data. Due to the large volume of data available in this context, we chose to present effect size measures along with all statistical analysis of data. We introduced two new robust measures of effect size for two-sample comparison studies for Non-normal situations, specifically where the difference of two populations is due to the shape parameter. The proposed effect sizes perform adequately for non-normal data, as well as when two distributions differ from shape parameters. We will focus on conversion analysis, to investigate the causal relationship between the general clickstream information and online purchasing using a logistic regression approach. The aim is to find a classifier by assigning the probability of the event of online shopping in an e-commerce website. We also develop the application of a mixture of hidden Markov models (MixHMM) to model web browsing behaviour using sequences of web pages viewed by users of an e-commerce website. The mixture of hidden Markov model will be performed in the Bayesian context using Gibbs sampling. We address the slow mixing problem of using Gibbs sampling in high dimensional models, and use the over-relaxed Gibbs sampling, as well as forward-backward EM algorithm to obtain an adequate sample of the posterior distributions of the parameters. The MixHMM provides an advantage of clustering users based on their browsing behaviour, and also gives an automatic classification of web pages based on the probability of observing web page by visitors in the website

Durham e-Theses

OpenGrey Repository

Assessing Post Usage for Measuring the Quality of Forum Posts

Author: Chai Kevin
Hayati Pedram
Potdar Vidyasagar
Talevski Alex
Wu Chen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

It has become difficult to discover quality content within forums websites due to the increasing amount of UserGenerated Content (UGC) on the Web. Many existing websites have relied on their users to explicitly rate content quality. The main problem with this approach is that the majority of content often receives insufficient rating. Current automated content rating solutions have evaluated linguistic features of UGC but are less effective for different types of online communities. We propose a novel approach that assesses post usage to measure the quality of forum posts. Post usage can be viewed as implicit user ratings derived from their usage behaviour. The proposed model is validated against an operational forum using Matthews Correlation Coefficient to measure performance. Our model serves as a basis of exploring content usage to measure content quality in forums and other Web 2.0 platforms

CiteSeerX

Crossref

espace@Curtin