346 research outputs found
Universal Mobile Information Retrieval
International audienceThe shift in human computer interaction from desktop computing to mobile interaction highly influences the needs for new designed interfaces. In this paper, we address the issue of searching for information on mobile devices, an area also known as Mobile Information Retrieval. In particular, we propose to summarize as much as possible the information retrieved by any search engine to allow universal access to information
An empirical study of web interface design on small display devices
This paper reports an empirical study that explores the problem of finding a highly-efficient, user-friendly interface design method on small display devices. We compared three models using our PDA interface simulator: presentation optimization method, semantic conversion method, and zooming method. A controlled experiment has been carried out to identify the pros and cons of each method. The results show that of the three interface methods, the zooming method is slightly better than the semantic conversion method, while they both outperform the optimizing presentation method. © 2004 IEEE
DOM-based Content Extraction of HTML Documents
Web pages often contain clutter around the body of the article as well as distracting features that take away from the true information that the user is pursuing. This can range from pop-up ads to flashy banners to unnecessary images and links scattered around the screen. Extraction of 'useful and relevant' content from web pages, has many applications ranging from lightweight environments, like cell phone and PDA browsing, to speech rendering for the visually impaired, to text summarization Most approaches to removing the clutter or making the content more readable involves either changing the size of the font or simply removing certain HTML-denoted components like images, thus taking away from the webpage's inherent look and feel. Unlike Content Reformatting, which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses Content Extraction. We have developed a framework that employs an easily extensible set of techniques that incorporate advantages of previous work on content extraction while limiting the disadvantages. Our key insight is to work with the Document Object Model tree (after parsing and correcting the HTML), rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy that anyone can use to extract content from HTML web pages for their own purposes
Web Mediators for Accessible Browsing
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number of all text characters on a web page. K-means clustering is used to create unique thresholds to differentiate index pages and article pages on individual web sites. Index pages contain mostly links to articles and other indices, while article pages contain mostly text. We also present a novel link grouping algorithm using agglomerative hierarchical clustering that groups links in the same spatial neighborhood together while preserving link structure. Grouping allows users with severe disabilities to use a scan-based mechanism to tab through a web page and select items. In experiments, we saw up to a 40-fold reduction in the number of commands needed to click on a link with a scan-based interface, which shows that we can vastly improve the rate of communication for users with disabilities. We used web page classification and link grouping to alter web page display on an accessible web browser that we developed to make a usable browsing interface for users with disabilities. Our classification method consistently outperformed a baseline classifier even when using minimal data to generate article and index clusters, and achieved classification accuracy of 94.0% on web sites with well-formed or slightly malformed HTML, compared with 80.1% accuracy for the baseline classifier.National Science Foundation (IIS-0308213, IIS-039009, IIS-0093367, P200A01031, EIA-0202067
Recommended from our members
Automating Content Extraction of HTML Documents
Web pages often contain clutter (such as unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction of 'useful and relevant' content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, and text summarization. Most approaches to making content more readable involve changing font size or removing HTML and data components such as images, which takes away from a webpage's inherent look and feel. Unlike 'Content Reformatting', which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses 'Content Extraction'. We have developed a framework that employs an easily extensible set of techniques. It incorporates advantages of previous work on content extraction. Our key insight is to work with DOM trees, a W3C specified interface that allows programs to dynamically access document structure, rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy to extract content from HTML web pages. This proxy can be used both centrally, administered for groups of users, as well as by individuals for personal browsers. We have also, after receiving feedback from users about the proxy, created a revised version with improved performance and accessibility in mind
Making results fit into 40 characters: a study in document rewriting
With the increasing popularity of mobile and hand-held devices, automatic approaches for adapting results to the limited screen size of mobile devices are becoming more important. Traditional approaches for reducing the length of textual results include summarisation and snippet extraction. In this study, we investigate document rewriting techniques which retain the meaning and readability of the original text. Evaluations on different document sets show that i) rewriting documents considerably reduces document length and thus, scrolling effort on devices with limited screen size, and ii) the rewritten documents have a higher readability
- …