6,971 research outputs found
Preprocessing and Content/Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development
From its appearance until nowadays, the internet saw a spectacular growth not only in terms of websites number and information volume, but also in terms of the number of visitors. Therefore, the need of an overall analysis regarding both the web sites and the content provided by them was required. Thus, a new branch of research was developed, namely web mining, that aims to discover useful information and knowledge, based not only on the analysis of websites and content, but also on the way in which the users interact with them. The aim of the present paper is to design a database that captures only the relevant data from logs in a way that will allow to store and manage large sets of temporal data with common tools in real time. In our work, we rely on different web sites or website sections with known architecture and we test several hypotheses from the literature in order to extend the framework to sites with unknown or chaotic structure, which are non-transparent in determining the type of visited pages. In doing this, we will start from non-proprietary, preexisting raw server logs.Knowledge Management, Web Mining, Data Preprocessing, Decision Trees, Databases
Recommended from our members
Investigation of the use of navigation tools in web-based learning: A data mining approach
Web-based learning is widespread in educational settings. The popularity of Web-based learning is in great measure because of its flexibility. Multiple navigation tools provided some of this flexibility. Different navigation tools offer different functions. Therefore, it is important to understand how the navigation tools are used by learners with different backgrounds, knowledge, and skills. This article presents two empirical studies in which data-mining approaches were used to analyze learners' navigation behavior. The results indicate that prior knowledge and subject content are two potential factors influencing the use of navigation tools. In addition, the lack of appropriate use of navigation tools may adversely influence learning performance. The results have been integrated into a model that can help designers develop Web-based learning programs and other Web-based applications that can be tailored to learners' needs
CHORUS Deliverable 4.3: Report from CHORUS workshops on national initiatives and metadata
Minutes of the following Workshops:
• National Initiatives on Multimedia Content Description and Retrieval, Geneva, October 10th, 2007.
• Metadata in Audio-Visual/Multimedia production and archiving, Munich, IRT, 21st – 22nd November 2007
Workshop in Geneva 10/10/2007
This highly successful workshop was organised in cooperation with the European Commission. The event brought together
the technical, administrative and financial representatives of the various national initiatives, which have been established
recently in some European countries to support research and technical development in the area of audio-visual content
processing, indexing and searching for the next generation Internet using semantic technologies, and which may lead to an
internet-based knowledge infrastructure. The objective of this workshop was to provide a platform for mutual information
and exchange between these initiatives, the European Commission and the participants. Top speakers were present from
each of the national initiatives. There was time for discussions with the audience and amongst the European National
Initiatives. The challenges, communalities, difficulties, targeted/expected impact, success criteria, etc. were tackled. This
workshop addressed how these national initiatives could work together and benefit from each other.
Workshop in Munich 11/21-22/2007
Numerous EU and national research projects are working on the automatic or semi-automatic generation of descriptive and
functional metadata derived from analysing audio-visual content. The owners of AV archives and production facilities are
eagerly awaiting such methods which would help them to better exploit their assets.Hand in hand with the digitization of
analogue archives and the archiving of digital AV material, metadatashould be generated on an as high semantic level as
possible, preferably fully automatically. All users of metadata rely on a certain metadata model. All AV/multimedia search
engines, developed or under current development, would have to respect some compatibility or compliance with the
metadata models in use. The purpose of this workshop is to draw attention to the specific problem of metadata models in the
context of (semi)-automatic multimedia search
Recommended from our members
Mining learning preferences in web-based instruction: Holists vs. Serialists
Web-based instruction programs are used by learners with diverse knowledge, skills and needs. These differences determine their preferences for the design of Web-based instruction programs and ultimately influence learners' success in using them. Cognitive style has been found to significantly affect learners' preferences of web-based instruction programs. However, the majority of previous studies focus on Field Dependence/Independence. Pask's Holist/Serialist dimension has conceptual links with Field Dependence/Independence but it is left mostly unstudied. Therefore, this study focuses on identifying how this dimension of cognitive style affects learner preferences of Web-based instruction programs. A data mining approach is used to illustrate the difference in preferences between Holists and Serialists. The findings show that there are clear differences in regard to content presentation and navigation support. A set of design features were then produced to help designers incorporate cognitive styles into the development of Web-based instruction programs to ensure that they can accommodate learners' different preferences.This work is partially funded by National Science Council, Taiwan, ROC (NSC 98-2511-S-008-012- MY3; NSC 99-
2511-S-008 -003 -MY2; NSC 99-2631-S-008-001)
HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web
When users interact with the Web today, they leave sequential digital trails
on a massive scale. Examples of such human trails include Web navigation,
sequences of online restaurant reviews, or online music play lists.
Understanding the factors that drive the production of these trails can be
useful for e.g., improving underlying network structures, predicting user
clicks or enhancing recommendations. In this work, we present a general
approach called HypTrails for comparing a set of hypotheses about human trails
on the Web, where hypotheses represent beliefs about transitions between
states. Our approach utilizes Markov chain models with Bayesian inference. The
main idea is to incorporate hypotheses as informative Dirichlet priors and to
leverage the sensitivity of Bayes factors on the prior for comparing hypotheses
with each other. For eliciting Dirichlet priors from hypotheses, we present an
adaption of the so-called (trial) roulette method. We demonstrate the general
mechanics and applicability of HypTrails by performing experiments with (i)
synthetic trails for which we control the mechanisms that have produced them
and (ii) empirical trails stemming from different domains including website
navigation, business reviews and online music played. Our work expands the
repertoire of methods available for studying human trails on the Web.Comment: Published in the proceedings of WWW'1
Moving Usability Testing onto the Web
Abstract: In order to remotely obtain detailed usability data by tracking user behaviors
within a given web site, a server-based usability testing environment has been
created. Web pages are annotated in such a way that arbitrary user actions (such as
"mouse over link" or "click back button") can be selected for logging. In addition,
the system allows the experiment designer to interleave interactive questions into
the usability evaluation, which for instance could be triggered by a particular sequence
of actions. The system works in conjunction with clustering and visualization
algorithms that can be applied to the resulting log file data. A first version of
the system has been used successfully to carry out a web usability evaluation
Adaptive Information Visualization for Personalized Access to Educational Digital Libraries
Personalization is one of the emerging ways to increase the power of modern Digital Libraries. The Knowledge Sea II system presented in this paper explores social navigation support, an approach for providing personalized guidance within the open corpus of educational resources. Following the concepts of social navigation we have attempted to organize a personalized navigation support that is based on past learners’ interaction with the system. The study indicates that Knowledge Sea II became the students' primary tool for accessing the open corpus documents used in a programming course. The social navigation support implemented in this system was considered useful by students participating in the study of Knowledge Sea II. At the same time, some user comments indicated the need to provide more powerful navigational support, such as the ability to rank the usefulness of a page
Why People Search for Images using Web Search Engines
What are the intents or goals behind human interactions with image search
engines? Knowing why people search for images is of major concern to Web image
search engines because user satisfaction may vary as intent varies. Previous
analyses of image search behavior have mostly been query-based, focusing on
what images people search for, rather than intent-based, that is, why people
search for images. To date, there is no thorough investigation of how different
image search intents affect users' search behavior.
In this paper, we address the following questions: (1)Why do people search
for images in text-based Web image search systems? (2)How does image search
behavior change with user intent? (3)Can we predict user intent effectively
from interactions during the early stages of a search session? To this end, we
conduct both a lab-based user study and a commercial search log analysis.
We show that user intents in image search can be grouped into three classes:
Explore/Learn, Entertain, and Locate/Acquire. Our lab-based user study reveals
different user behavior patterns under these three intents, such as first click
time, query reformulation, dwell time and mouse movement on the result page.
Based on user interaction features during the early stages of an image search
session, that is, before mouse scroll, we develop an intent classifier that is
able to achieve promising results for classifying intents into our three intent
classes. Given that all features can be obtained online and unobtrusively, the
predicted intents can provide guidance for choosing ranking methods immediately
after scrolling
- …