6,971 research outputs found

    Preprocessing and Content/Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development

    Get PDF
    From its appearance until nowadays, the internet saw a spectacular growth not only in terms of websites number and information volume, but also in terms of the number of visitors. Therefore, the need of an overall analysis regarding both the web sites and the content provided by them was required. Thus, a new branch of research was developed, namely web mining, that aims to discover useful information and knowledge, based not only on the analysis of websites and content, but also on the way in which the users interact with them. The aim of the present paper is to design a database that captures only the relevant data from logs in a way that will allow to store and manage large sets of temporal data with common tools in real time. In our work, we rely on different web sites or website sections with known architecture and we test several hypotheses from the literature in order to extend the framework to sites with unknown or chaotic structure, which are non-transparent in determining the type of visited pages. In doing this, we will start from non-proprietary, preexisting raw server logs.Knowledge Management, Web Mining, Data Preprocessing, Decision Trees, Databases

    CHORUS Deliverable 4.3: Report from CHORUS workshops on national initiatives and metadata

    Get PDF
    Minutes of the following Workshops: • National Initiatives on Multimedia Content Description and Retrieval, Geneva, October 10th, 2007. • Metadata in Audio-Visual/Multimedia production and archiving, Munich, IRT, 21st – 22nd November 2007 Workshop in Geneva 10/10/2007 This highly successful workshop was organised in cooperation with the European Commission. The event brought together the technical, administrative and financial representatives of the various national initiatives, which have been established recently in some European countries to support research and technical development in the area of audio-visual content processing, indexing and searching for the next generation Internet using semantic technologies, and which may lead to an internet-based knowledge infrastructure. The objective of this workshop was to provide a platform for mutual information and exchange between these initiatives, the European Commission and the participants. Top speakers were present from each of the national initiatives. There was time for discussions with the audience and amongst the European National Initiatives. The challenges, communalities, difficulties, targeted/expected impact, success criteria, etc. were tackled. This workshop addressed how these national initiatives could work together and benefit from each other. Workshop in Munich 11/21-22/2007 Numerous EU and national research projects are working on the automatic or semi-automatic generation of descriptive and functional metadata derived from analysing audio-visual content. The owners of AV archives and production facilities are eagerly awaiting such methods which would help them to better exploit their assets.Hand in hand with the digitization of analogue archives and the archiving of digital AV material, metadatashould be generated on an as high semantic level as possible, preferably fully automatically. All users of metadata rely on a certain metadata model. All AV/multimedia search engines, developed or under current development, would have to respect some compatibility or compliance with the metadata models in use. The purpose of this workshop is to draw attention to the specific problem of metadata models in the context of (semi)-automatic multimedia search

    HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web

    Full text link
    When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.Comment: Published in the proceedings of WWW'1

    Moving Usability Testing onto the Web

    Get PDF
    Abstract: In order to remotely obtain detailed usability data by tracking user behaviors within a given web site, a server-based usability testing environment has been created. Web pages are annotated in such a way that arbitrary user actions (such as "mouse over link" or "click back button") can be selected for logging. In addition, the system allows the experiment designer to interleave interactive questions into the usability evaluation, which for instance could be triggered by a particular sequence of actions. The system works in conjunction with clustering and visualization algorithms that can be applied to the resulting log file data. A first version of the system has been used successfully to carry out a web usability evaluation

    Adaptive Information Visualization for Personalized Access to Educational Digital Libraries

    Get PDF
    Personalization is one of the emerging ways to increase the power of modern Digital Libraries. The Knowledge Sea II system presented in this paper explores social navigation support, an approach for providing personalized guidance within the open corpus of educational resources. Following the concepts of social navigation we have attempted to organize a personalized navigation support that is based on past learners’ interaction with the system. The study indicates that Knowledge Sea II became the students' primary tool for accessing the open corpus documents used in a programming course. The social navigation support implemented in this system was considered useful by students participating in the study of Knowledge Sea II. At the same time, some user comments indicated the need to provide more powerful navigational support, such as the ability to rank the usefulness of a page

    Why People Search for Images using Web Search Engines

    Get PDF
    What are the intents or goals behind human interactions with image search engines? Knowing why people search for images is of major concern to Web image search engines because user satisfaction may vary as intent varies. Previous analyses of image search behavior have mostly been query-based, focusing on what images people search for, rather than intent-based, that is, why people search for images. To date, there is no thorough investigation of how different image search intents affect users' search behavior. In this paper, we address the following questions: (1)Why do people search for images in text-based Web image search systems? (2)How does image search behavior change with user intent? (3)Can we predict user intent effectively from interactions during the early stages of a search session? To this end, we conduct both a lab-based user study and a commercial search log analysis. We show that user intents in image search can be grouped into three classes: Explore/Learn, Entertain, and Locate/Acquire. Our lab-based user study reveals different user behavior patterns under these three intents, such as first click time, query reformulation, dwell time and mouse movement on the result page. Based on user interaction features during the early stages of an image search session, that is, before mouse scroll, we develop an intent classifier that is able to achieve promising results for classifying intents into our three intent classes. Given that all features can be obtained online and unobtrusively, the predicted intents can provide guidance for choosing ranking methods immediately after scrolling
    corecore