Search CORE

2,055 research outputs found

Perspectives for Electronic Books in the World Wide Web Age

Author: Bry François
Kraus Michael
Publication venue: 'Emerald'
Publication date: 01/01/2002
Field of study

While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e-books) remain a niche market. In this article, it is first postulated that specialized contents and device independence can make Web-based e-books compete with paper prints; and that adaptive features that can be implemented by client-side computing are relevant for e-books, while more complex forms of adaptation requiring server-side computations are not. Then, enhancements of the WWW standards (specifically of XML, XHTML, of the style-sheet languages CSS and XSL, and of the linking language XLink) are proposed for a better support of client-side adaptation and device independent content modeling. Finally, advanced browsing functionalities desirable for e-books as well as their implementation in the WWW context are described

Crossref

Open Access LMU

Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia

Author: Clemesha A.
Milgram S.
Milne D.
Popescul A.
Singer P.
Taskar B.
West R.
West R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/03/2015
Field of study

Hyperlinks are an essential feature of the World Wide Web. They are especially important for online encyclopedias such as Wikipedia: an article can often only be understood in the context of related articles, and hyperlinks make it easy to explore this context. But important links are often missing, and several methods have been proposed to alleviate this problem by learning a linking model based on the structure of the existing links. Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia's navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality

arXiv.org e-Print Archive

CiteSeerX

Crossref

An effective, low-cost measure of semantic relatedness obtained from Wikipedia links

Author: Milne David N.
Witten Ian H.
Publication venue: AAAI Press
Publication date: 01/01/2008
Field of study

This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter

CiteSeerX

Research Commons@Waikato

Recognizing anchoring text patterns on the web

Author: Bhandari Shruti K.
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

University of Twente @ TREC 2009: Indexing half a billion web pages

Author: Hauff Claudia
Hiemstra Djoerd
Publication venue: National Institute of Standards and Technology (NIST)
Publication date: 01/01/2009
Field of study

This report presents results for the TREC 2009 adhoc task, the diversity task, and the relevance feedback task. We present ideas for unsupervised tuning of search system, an approach for spam removal, and the use of categories and query log information for diversifying search results

CiteSeerX

Radboud Repository

University of Twente Research Information

Web Page Retrieval by Combining Evidence

Author: Alonso-Berrocal José-Luis
G.-Figuerola Carlos
Rodríguez-Vázquez-de-Aldana Emilio
Zazo Ángel F.
Publication venue
Publication date: 01/01/2006
Field of study

The participation of the REINA Research Group in WebCLEF 2005 focused in the monolingual mixed task. Queries or topics are of two types: named and home pages. For both, we first perform a search by thematic contents; for the same query, we do a search in several elements of information from every page (title, some meta tags, anchor text) and then we combine the results. For queries about home pages, we try to detect using a method based in some keywords and their patterns of use. After, a re-rank of the results of the thematic contents retrieval is performed, based on Page-Rank and Centrality coeficients

E-LIS

Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

Author: Resnik Philip
Publication venue
Publication date: 01/01/1998
Field of study

Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.Comment: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty. An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html contains test dat

arXiv.org e-Print Archive

CiteSeerX

Digital Repository at the University of Maryland

Educational framework based on cumulative vocabularies, conceptual networks and Wikipedia linkage

Author: Lahti Lauri
Publication venue: London International Conference on Education
Publication date: 01/01/2013
Field of study

We propose a new educational framework based on guidedexploration in small-world networks relying on hyperlinknetwork of the Wikipedia online encyclopedia(http://www.wikipedia.org) in which hyperlinks betweenarticles define conceptual relationships. Educationalmaterial is presented to student with cumulativeconceptual networks based on hyperlink network of theWikipedia connecting concepts of vocabulary aboutcurrent learning topic. Personalization of educationalmaterial is carried out by alternating the distribution ofenabled hyperlinks connecting concepts belonging tocurrent vocabulary according to requirements of learningobjective, learning context and learner’s knowledge.Besides developing a computational method to manageeducational material with conceptual networks and toexplore the shortest paths between concepts of vocabulary(especially highest-ranking hyperlinked concepts andstrongly rising hyperlinked concepts), we have alsoexperimentally estimated properties of conceptualnetworks generated based on hyperlink network of theWikipedia between concepts retrieved from EnglishVocabulary Profile for cumulatively growing vocabulariescorresponding to six language ability levels.Peer reviewe

Aaltodoc Publication Archive