Search CORE

60 research outputs found

Data Cleaning Methods for Client and Proxy Logs

Author: Herder E.
Obendorf H.
Weinreich H.
Publication venue: Dalhousie University
Publication date: 01/01/2006
Field of study

In this paper we present our experiences with the cleaning of Web client and proxy usage logs, based on a long-term browsing study with 25 participants. A detailed clickstream log, recorded using a Web intermediary, was combined with a second log of user interface actions, which was captured by a modified Firefox browser for a subset of the participants. The consolidated data from both records revealed many page requests that were not directly related to user actions. For participants who had no ad-filtering system installed, these artifacts made up one third of all transferred Web pages. Three major reasons could be identified: HTML Frames and iFrames, advertisements, and automatic page reloads. The experiences made during the data cleaning process might help other researchers to choose adequate filtering methods for their data

CiteSeerX

University of Twente Research Information

Web users' information retrieval methods and skills

Author: Bond Carol S.
Publication venue: 'Emerald'
Publication date: 01/08/2004
Field of study

When trying to locate information on the Web people are faced with a variety of options. This research reviewed how a group of health related professionals approached the task of finding a named document. Most were eventually successful, but the majority encountered problems in their search techniques. Even experienced Web users had problems when working with a different interface to normal, and without access to their favourites. No relationship was found between the number of years' experience Web users had and the efficiency of their searching strategy. The research concludes that if people are to be able to use the Web quickly and efficiently as an effective information retrieval tool, as opposed to a recreational tool to surf the Internet, they need to have both an understanding of the medium and the tools, and the skills to use them effectively, both of which were lacking in the majority of participants in this study

Crossref

Bournemouth University Research Online

Characterizations of User Web Revisit Behavior

Author: Herder E.
Publication venue
Publication date: 01/01/2005
Field of study

In this article we update and extend on earlier long-term studies on user's page revisit behavior. Revisits ar

CiteSeerX

University of Twente Research Information

VEWS: A Wikipedia Vandal Early Warning System

Author: Kumar Srijan
Spezzano Francesca
Subrahmanian V. S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2015
Field of study

We study the problem of detecting vandals on Wikipedia before any human or known vandalism detection system reports flagging potential vandals so that such users can be presented early to Wikipedia administrators. We leverage multiple classical ML approaches, but develop 3 novel sets of features. Our Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing patterns as features to classify some users as vandals. Our Wikipedia Transition Probability Matrix (WTPM) approach uses a set of features derived from a transition probability matrix and then reduces it via a neural net auto-encoder to classify some users as vandals. The VEWS approach merges the previous two approaches. Without using any information (e.g. reverts) provided by other users, these algorithms each have over 85% classification accuracy. Moreover, when temporal recency is considered, accuracy goes to almost 90%. We carry out detailed experiments on a new data set we have created consisting of about 33K Wikipedia users (including both a black list and a white list of editors) and containing 770K edits. We describe specific behaviors that distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG and STiki, the best known algorithms today for vandalism detection. Moreover, VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39 edits before ClueBot NG when both detect the vandal. However, we show that the combination of VEWS and ClueBot NG can give a fully automated vandal early warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of Knowledge Discovery and Data Mining (KDD 2015

arXiv.org e-Print Archive

Crossref

Agents, Bookmarks and Clicks: A topical model of Web traffic

Author: Flammini Alessandro
Gonçalves Bruno
Meiss Mark
Menczer Filippo
Ramasco José J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in Proceedings of the 21th ACM conference on Hypertext and Hypermedi

arXiv.org e-Print Archive

Crossref

I-pot: a new approach utilising visual and contextual cues to support users in graphical web browser revisitation.

Author: Chen K.
Chen K.
Prior S.
Prior S.
Shen S.
Shen S.
Publication venue: 'IADIS - International Association for the Development of the Information Society'
Publication date: 01/01/2010
Field of study

With a quarter of the world’s population now having access to the internet, the area of web efficiency and optimal use is of growing importance to all users. The function of revisitation, where a user wants to return to a website that they have visited in the recent past becomes more important. Current static and textual approaches developed within the latest versions of mainstream web browsers leave much to be desired. This paper suggests a new approach via the use of organic visual and contextual cues to support users in this task area

Middlesex University Research Repository

Automatic classification of web pages into bookmark categories

Author: 4th Computer Science Annual Workshop (CSAW’06)
Staff Chris
Publication venue: University of Malta. Faculty of ICT
Publication date: 01/01/2006
Field of study

We describe a technique to automatically classify a web page into an existing bookmark category whenever a user decides to bookmark a page. HyperBK compares a bag-of-words representation of the page to descriptions of categories in the user’s bookmark file. Unlike default web browser dialogs in which the user may be presented with the category into which he or she saved the last bookmarked file, HyperBK also offers the category most similar to the page being bookmarked. The user can opt to save the page to the last category used; create a new category; or save the page elsewhere. In an evaluation, the user’s preferred category was offered on average 67% of the time.peer-reviewe

OAR@UM

Les pratiques informationnelles individuelles et collectives

Author: Dinet Jérôme
Publication venue: IUT de Strasbourg
Publication date
Field of study

Communication faite lors de la journée d\u27étude Thémat\u27IC 2007 "La maîtrise de l\u27information par les adultes : enjeux et méthodes", Strasbourg, mars 2007

Bibliothèque numérique de l'enssib

How people recognize previously seen Web pages from titles, URLs and thumbnails

Author: Saul Greenberg
Shaun Kaasten
Publication venue
Publication date: 01/01/2002
Field of study

The selectable lists of pages offered by web browsers ’ history and bookmark facilities ostensibly make it easier for people to return to previously visited pages. These lists show the pages as abstractions, typically as truncated titles and URLs, and more rarely as small thumbnail images. Yet we have little knowledge of how recognizable these representations really are. Consequently, we carried out a study that compared the recognizability of thumbnails between various image sizes, and of titles and URLs between various string sizes. Our results quantify the tradeoff between the size of these representations and their recognizability. These findings directly contribute to how history and bookmark lists should be designed

CiteSeerX