2,055 research outputs found

    Perspectives for Electronic Books in the World Wide Web Age

    Get PDF
    While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e-books) remain a niche market. In this article, it is first postulated that specialized contents and device independence can make Web-based e-books compete with paper prints; and that adaptive features that can be implemented by client-side computing are relevant for e-books, while more complex forms of adaptation requiring server-side computations are not. Then, enhancements of the WWW standards (specifically of XML, XHTML, of the style-sheet languages CSS and XSL, and of the linking language XLink) are proposed for a better support of client-side adaptation and device independent content modeling. Finally, advanced browsing functionalities desirable for e-books as well as their implementation in the WWW context are described

    Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia

    Full text link
    Hyperlinks are an essential feature of the World Wide Web. They are especially important for online encyclopedias such as Wikipedia: an article can often only be understood in the context of related articles, and hyperlinks make it easy to explore this context. But important links are often missing, and several methods have been proposed to alleviate this problem by learning a linking model based on the structure of the existing links. Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia's navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality

    An effective, low-cost measure of semantic relatedness obtained from Wikipedia links

    Get PDF
    This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter

    University of Twente @ TREC 2009: Indexing half a billion web pages

    Get PDF
    This report presents results for the TREC 2009 adhoc task, the diversity task, and the relevance feedback task. We present ideas for unsupervised tuning of search system, an approach for spam removal, and the use of categories and query log information for diversifying search results

    Web Page Retrieval by Combining Evidence

    Get PDF
    The participation of the REINA Research Group in WebCLEF 2005 focused in the monolingual mixed task. Queries or topics are of two types: named and home pages. For both, we first perform a search by thematic contents; for the same query, we do a search in several elements of information from every page (title, some meta tags, anchor text) and then we combine the results. For queries about home pages, we try to detect using a method based in some keywords and their patterns of use. After, a re-rank of the results of the thematic contents retrieval is performed, based on Page-Rank and Centrality coeficients

    Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

    Get PDF
    Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.Comment: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty. An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html contains test dat

    Educational framework based on cumulative vocabularies, conceptual networks and Wikipedia linkage

    Get PDF
    We propose a new educational framework based on guidedexploration in small-world networks relying on hyperlinknetwork of the Wikipedia online encyclopedia(http://www.wikipedia.org) in which hyperlinks betweenarticles define conceptual relationships. Educationalmaterial is presented to student with cumulativeconceptual networks based on hyperlink network of theWikipedia connecting concepts of vocabulary aboutcurrent learning topic. Personalization of educationalmaterial is carried out by alternating the distribution ofenabled hyperlinks connecting concepts belonging tocurrent vocabulary according to requirements of learningobjective, learning context and learner’s knowledge.Besides developing a computational method to manageeducational material with conceptual networks and toexplore the shortest paths between concepts of vocabulary(especially highest-ranking hyperlinked concepts andstrongly rising hyperlinked concepts), we have alsoexperimentally estimated properties of conceptualnetworks generated based on hyperlink network of theWikipedia between concepts retrieved from EnglishVocabulary Profile for cumulatively growing vocabulariescorresponding to six language ability levels.Peer reviewe
    corecore