Search CORE

4,917 research outputs found

Preprocessing and Content/Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development

Author: Dan Andrei SITAR TAUT
Daniel MICAN
Publication venue
Publication date
Field of study

From its appearance until nowadays, the internet saw a spectacular growth not only in terms of websites number and information volume, but also in terms of the number of visitors. Therefore, the need of an overall analysis regarding both the web sites and the content provided by them was required. Thus, a new branch of research was developed, namely web mining, that aims to discover useful information and knowledge, based not only on the analysis of websites and content, but also on the way in which the users interact with them. The aim of the present paper is to design a database that captures only the relevant data from logs in a way that will allow to store and manage large sets of temporal data with common tools in real time. In our work, we rely on different web sites or website sections with known architecture and we test several hypotheses from the literature in order to extend the framework to sites with unknown or chaotic structure, which are non-transparent in determining the type of visited pages. In doing this, we will start from non-proprietary, preexisting raw server logs.Knowledge Management, Web Mining, Data Preprocessing, Decision Trees, Databases

Research Papers in Economics

Generating dynamic higher-order Markov models in web usage mining

Author: E. Charniak
J. Borges
J. Borges
M. Deshpande
M. Levene
M. Perkowitz
M. Spiliopoulou
R.R. Sarukkai
S. Schechter
X. Chen
X. Dongshan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter

CiteSeerX

Crossref

Birkbeck Institutional Research Online

A new intelligent algorithm to create a profile for user based on web interactions

Author: Ali Harounabadi
Javad mirabedini
Zeinab khademali
Publication venue: Growing Science
Publication date: 01/04/2013
Field of study

This paper presents a method to classify the web user’s navigation patterns automatically. The proposed model of this paper classifies user’s navigation patterns and predicts his/her upcoming requirements. To create users’ profile, a new method is introduced by recording user’s settings active and user’s similarity measurement with neighboring users. The proposed model is capable of creating the profile implicitly. Besides, it updates the profile based on created changes. In fact, we try to improve the function of recommender engine using user’s navigation patterns and clustering. The method is based on user’s navigation patterns and is able to present the result of recommender engine based on user’s requirement and interest. In addition, this method has the ability to help customize websites, more efficiently

Directory of Open Access Journals

Towards trajectory anonymization: a generalization-based approach

Author: Atzori Maurizio
Guc Baris
Güç Barış
Nergiz Mehmet Ercan
Saygin Yucel
Saygın Yücel
Publication venue: IIIA-CSIC
Publication date: 01/01/2009
Field of study

Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques

Archivio istituzionale della ricerca - Università di Cagliari

Sabanci University Research Database

Web User Session Reconstruction Using Integer Programming

Author: Dell Robert F.
Román Pablo E.
Velásquez Juan D.
Publication venue
Publication date: 01/01/2008
Field of study

An important input for web usage mining is web user sessions that must be reconstructed from web logs (sessionization) when such sessions are not otherwise identified. We present a novel approach for sessionization based on an in- teger program. We compare results of our approach with the timeout heuristic on web logs from an academic web site. We find our integer program provides sessions that better match an expected empirical distribution with about half of the standard error of the heuristic.This work has been partially supported by the National Doctoral Grant from Conicyt Chile and by the Chilean Millennium Scientific Institute of Complex Engineering Systems

Calhoun, Institutional Archive of the Naval Postgraduate School