5,626 research outputs found
Recommended from our members
Automating Content Extraction of HTML Documents
Web pages often contain clutter (such as unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction of 'useful and relevant' content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, and text summarization. Most approaches to making content more readable involve changing font size or removing HTML and data components such as images, which takes away from a webpage's inherent look and feel. Unlike 'Content Reformatting', which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses 'Content Extraction'. We have developed a framework that employs an easily extensible set of techniques. It incorporates advantages of previous work on content extraction. Our key insight is to work with DOM trees, a W3C specified interface that allows programs to dynamically access document structure, rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy to extract content from HTML web pages. This proxy can be used both centrally, administered for groups of users, as well as by individuals for personal browsers. We have also, after receiving feedback from users about the proxy, created a revised version with improved performance and accessibility in mind
Implicit Measures of Lostness and Success in Web Navigation
In two studies, we investigated the ability of a variety of structural and temporal measures computed from a web navigation path to predict lostness and task success. The user’s task was to find requested target information on specified websites. The web navigation measures were based on counts of visits to web pages and other statistical properties of the web usage graph (such as compactness, stratum, and similarity to the optimal path). Subjective lostness was best predicted by similarity to the optimal path and time on task. The best overall predictor of success on individual tasks was similarity to the optimal path, but other predictors were sometimes superior depending on the particular web navigation task. These measures can be used to diagnose user navigational problems and to help identify problems in website design
Challenges in Bridging Social Semantics and Formal Semantics on the Web
This paper describes several results of Wimmics, a research lab which names
stands for: web-instrumented man-machine interactions, communities, and
semantics. The approaches introduced here rely on graph-oriented knowledge
representation, reasoning and operationalization to model and support actors,
actions and interactions in web-based epistemic communities. The re-search
results are applied to support and foster interactions in online communities
and manage their resources
Recommendation, collaboration and social search
This chapter considers the social component of interactive information retrieval: what is the role of other people in searching and browsing? For simplicity we begin by considering situations without computers. After all, you can interactively retrieve information without a computer; you just have to interact with someone or something else. Such an analysis can then help us think about the new forms of collaborative interactions that extend our conceptions of information search, made possible by the growth of networked ubiquitous computing technology.
Information searching and browsing have often been conceptualized as a solitary activity, however they always have a social component. We may talk about 'the' searcher or 'the' user of a database or information resource. Our focus may be on individual uses and our research may look at individual users. Our experiments may be designed to observe the behaviors of individual subjects. Our models and theories derived from our empirical analyses may focus substantially or exclusively on an individual's evolving goals, thoughts, beliefs, emotions and actions. Nevertheless there are always social aspects of information seeking and use present, both implicitly and explicitly.
We start by summarizing some of the history of information access with an emphasis on social and collaborative interactions. Then we look at the nature of recommendations, social search and interfaces to support collaboration between information seekers. Following this we consider how the design of interactive information systems is influenced by their social elements
A Semantic-Based Information Management System to Support Innovative Product Design
International competition and the rapidly global economy, unified by improved communication and transportation, offer to the consumers an enormous choice of goods and services. The result is that companies now require quality, value, time to market and innovation to be successful in order to win the increasing competition. In the engineering sector this is traduced in need of optimization of the design process and in maximization of re-use of data and knowledge already existing in the company. The “SIMI-Pro” (Semantic Information Management system for Innovative Product design) system addresses specific deficiencies in the conceptual phase of product design when knowledge management, if applied, is often sectorial. Its main contribution is in allowing easy, fast and centralized collection of data from multiple sources and in supporting the retrieval and re-use of a wide range of data that will help stylists and engineers shortening the production cycle. SIMI-Pro will be one of the first prototypes to base its information management and its knowledge sharing system on process ontology and it will demonstrate how the use of centralized network systems, coupled with Semantic Web technologies, can improve inter-working activities and interdisciplinary knowledge sharing
- …