Search CORE

6 research outputs found

Using Explicit Semantic Analysis for Cross-Lingual Link Discovery

Author: Knoth Petr
Zdrahal Zdenek
Zilka Lukas
Publication venue
Publication date: 01/01/2011
Field of study

This paper explores how to automatically generate cross language links between resources in large document collections. The paper presents new methods for Cross Lingual Link Discovery(CLLD) based on Explicit Semantic Analysis (ESA). The methods are applicable to any multilingual document collection. In this report, we present their comparative study on the Wikipedia corpus and provide new insights into the evaluation of link discovery systems. In particular, we measure the agreement of human annotators in linking articles in different language versions of Wikipedia, and compare it to the results achieved by the presented methods

CiteSeerX

Open Research Online

Recommended from our members

KMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia using explicit semantic analysis

Author: Knoth Petr
Zdrahal Zdenek
Zilka Lukas
Publication venue
Publication date: 01/01/2011
Field of study

This paper describes the methods used in the submission of Knowledge Media institute (KMI), The Open University to the NTCIR-9 Cross-Lingual Link Discovery (CLLD)task entitled CrossLink. KMI submitted four runs for link discovery from English to Chinese; however, the developed methods, which utilise Explicit Semantic Analysis (ESA), are applicable also to other language combinations. Three of the runs are based on exploiting the existing cross-lingual mapping between different versions of Wikipedia articles. In the fourth run, we assume information about the mapping is not available. Our methods achieved encouraging results and we describe in detail how their performance can be further improved. Finally, we discuss two important issues in link discovery: the evaluation methodology and the applicability of the developed methods across dfferent textual collections

Open Research Online

Automatic generation of inter-passage links based on semantic similarity

Author: Knoth Petr
Novotny Jakub
Zdrahal Zdenek
Publication venue
Publication date: 23/08/2010
Field of study

This paper investigates the use and the prediction potential of semantic similarity measures for automatic generation of links across different documents and passages. First, the correlation between the way people link content and the results produced by standard semantic similarity measures is investigated. The relation between semantic similarity and the length of the documents is then also analysed. Based on these findings a new method for link generation is formulated and tested

CiteSeerX

Open Research Online

Using Explicit Semantic Analysis to Link in Multi-Lingual Document Collections

Author: Žilka Lukáš
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2012
Field of study

Udržování prolinkování dokumentů v ryhle rostoucích kolekcích je problematické. To je dále zvětšeno vícejazyčností těchto kolekcí. Navrhujeme použít Explicitní Sémantickou Analýzu k identifikaci relevantních dokumentů a linků napříč jazyky, bez použití strojového překladu. Navrhli jsme a implementovali několik přistupů v prototypu linkovacího systému. Evaluace byla provedena na Čínské, České, Anglické a Španělské Wikipedii. Diskutujeme evaluační metodologii pro linkovací systémy, a hodnotíme souhlasnost mezi odkazy v různých jazykoých verzích Wikipedie. Hodnotíme vlastnosti Explicitní Sémantické Analýzy důležité pro její praktické použití.Keeping links in quickly growing document collections up-to-date is problematic, which is exacerbated by their multi-linguality. We utilize Explicit Semantic Analysis to help identify relevant documents and links across languages without machine translation. We designed and implemented several approaches as a part of our link discovery system. Evaluation was conducted on Chinese, Czech, English and Spanish Wikipedia. Also, we discuss the evaluation methodology for such systems and assess the agreement between links on different versions of Wikipedia. In addition, we evaluate properties of Explicit Semantic Analysis which are important for its practical use.

Digital library of Brno University of Technology

National Repository of Grey Literature

Applying Wikipedia to Interactive Information Retrieval

Author: Milne David N.
Publication venue: 'University of Waikato'
Publication date: 15/09/2010
Field of study

There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval

Research Commons@Waikato

Wikisearching and wikilinking

Author: A. Trotman
D. Jenkinson
K.N. Fachry
K.Y. Itakura
N. Fuhr
S. Geva
W. Huang
Publication venue
Publication date: 01/01/2009
Field of study

Abstract. The University of Otago submitted three element runs and three passage runs to the Relevance-in-Context task of the ad hoc track. The best Otago run was a whole-document run placing 7 th. The best Otago passage run placed 13 th while the best Otago element run placed 31 st. There were a total of 40 runs submitted to the task. The ad hoc result reinforced our prior belief that passages are better answers than elements and that the most important aspect of the focused retrieval is the identification of relevant documents. Six runs were submitted to the Link-the-Wiki track. The best Otago run placed 1 st (of 21) in file to file automatic assessment and 6 th (of 28) with manual assessment. The Itakura & Clarke algorithm was used for outgoing links, with special attention paid to parsing and case sensitivity. For incoming links representative terms were selected from the document and used to find similar documents. 1

CiteSeerX

Crossref