2,437 research outputs found
Agents, Bookmarks and Clicks: A topical model of Web traffic
Analysis of aggregate and individual Web traffic has shown that PageRank is a
poor model of how people navigate the Web. Using the empirical traffic patterns
generated by a thousand users, we characterize several properties of Web
traffic that cannot be reproduced by Markovian models. We examine both
aggregate statistics capturing collective behavior, such as page and link
traffic, and individual statistics, such as entropy and session size. No model
currently explains all of these empirical observations simultaneously. We show
that all of these traffic patterns can be explained by an agent-based model
that takes into account several realistic browsing behaviors. First, agents
maintain individual lists of bookmarks (a non-Markovian memory mechanism) that
are used as teleportation targets. Second, agents can retreat along visited
links, a branching mechanism that also allows us to reproduce behaviors such as
the use of a back button and tabbed browsing. Finally, agents are sustained by
visiting novel pages of topical interest, with adjacent pages being more
topically related to each other than distant ones. This modulates the
probability that an agent continues to browse or starts a new session, allowing
us to recreate heterogeneous session lengths. The resulting model is capable of
reproducing the collective and individual behaviors we observe in the empirical
data, reconciling the narrowly focused browsing patterns of individual users
with the extreme heterogeneity of aggregate traffic measurements. This result
allows us to identify a few salient features that are necessary and sufficient
to interpret the browsing patterns observed in our data. In addition to the
descriptive and explanatory power of such a model, our results may lead the way
to more sophisticated, realistic, and effective ranking and crawling
algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in
Proceedings of the 21th ACM conference on Hypertext and Hypermedi
Dublin City University video track experiments for TREC 2003
In this paper, we describe our experiments for both the News Story Segmentation task and Interactive Search task for
TRECVID 2003. Our News Story Segmentation task involved the use of a Support Vector Machine (SVM) to combine evidence from audio-visual analysis tools in order to generate a listing of news stories from a given news programme. Our
Search task experiment compared a video retrieval system based on text, image and relevance feedback with a text-only
video retrieval system in order to identify which was more effective. In order to do so we developed two variations of our FĂschlĂĄr video retrieval system and conducted user testing in a controlled lab environment. In this paper we outline our work on both of these two tasks
Log Pre-Processing and Grammatical Inference for Web Usage Mining
International audienceIn this paper, we propose a Web Usage Mining pre-processing method to retrieve missing data from the server log files. Moreover, we propose two levels of evaluation: directly on reconstructed data, but also after a machine learning step by evaluating inferred grammatical models. We conducted some experiments and we showed that our algorithm improves the quality of user data
Dublin City University video track experiments for TREC 2002
Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video
Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we
developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that
provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems
ScreenTrack: Using a Visual History of a Computer Screen to Retrieve Documents and Web Pages
Computers are used for various purposes, so frequent context switching is
inevitable. In this setting, retrieving the documents, files, and web pages
that have been used for a task can be a challenge. While modern applications
provide a history of recent documents for users to resume work, this is not
sufficient to retrieve all the digital resources relevant to a given primary
document. The histories currently available do not take into account the
complex dependencies among resources across applications. To address this
problem, we tested the idea of using a visual history of a computer screen to
retrieve digital resources within a few days of their use through the
development of ScreenTrack. ScreenTrack is software that captures screenshots
of a computer at regular intervals. It then generates a time-lapse video from
the captured screenshots and lets users retrieve a recently opened document or
web page from a screenshot after recognizing the resource by its appearance. A
controlled user study found that participants were able to retrieve requested
information more quickly with ScreenTrack than under the baseline condition
with existing tools. A follow-up study showed that the participants used
ScreenTrack to retrieve previously used resources and to recover the context
for task resumption.Comment: CHI 2020, 10 pages, 7 figure
Off the Beaten tracks: Exploring Three Aspects of Web Navigation
This paper presents results of a long-term client-side Web usage study, updating previous studies that range in age from five to ten years. We focus on three aspects of Web navigation: changes in the distribution of navigation actions, speed of navigation and within-page navigation.
âNavigation actionsâ corresponding to usersâ individual page requests are discussed by type. We reconfirm links to be the most important navigation element, while backtracking has lost more than half of its previously reported share and form submission has become far more common. Changes of the Web and the browser interfaces are candidates for causing these changes.
Analyzing the time users stayed on pages, we confirm Web navigation to be a rapidly interactive activity. A breakdown of page characteristics shows that users often do not take the time to read the available text or consider all links. The performance of the Web is analyzed and reassessed against the resulting requirements.
Finally, habits of within-page navigation are presented. Although most selected hyperlinks are located in the top left corner of the screen, in nearly a quarter of all cases people choose links that require scrolling. We analyzed the available browser real estate to gain insights for the design of non-scrolling Web pages
A Survey on Framework for Improved Web Data Clustering Using Language Processing Technique
Now a day, World Wide Web becomes very popular and interactive for transferring of information. It is a massive repository of web pages and links. It provides information about vast area for the internet user. The web is huge, diverse and active and thus increases the scalability, multimedia data & temporal matters. The growth of the web has outcome in a huge amount of information that is now freely offered for user access. Since due to tremendous usage, the log files are growing at a faster rate & the size is becoming huge. Preprocessing plays a vital role in efficient mining process as log data is normally noisy and indistinct. Reconstruction of session and paths are completed by appending missing pages in preprocessing. Additionally, the transactions which illustrate the behavior of users are constructed exactly in preprocessing by calculating the Reference Length of user access by means of byte rate, the clustering task the ability to capture the uncertainty among web userâs navigation performance
- âŠ