7,434 research outputs found
A Word Sense-Oriented User Interface for Interactive Multilingual Text Retrieval
In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to word sense disambiguation, cross-language text retrieval and document categorization and finally describe recent achievements of our research towards an interactive multilingual retrieval system. We focus especially on the problem of browsing and navigation of the different word senses in one source and possibly several target languages. In the last part of the paper, we discuss the developed user interface and its functionalities in more detail
Disambiguation strategies for cross-language information retrieval
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching
Retrieving with good sense
Although always present in text, word sense ambiguity only recently became regarded as a problem to information
retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in
disambiguation research. This paper first outlines this research and surveys the resulting efforts in information
retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt
from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval
Indexing with WordNet synsets can improve Text Retrieval
The classical, vector space model for text retrieval is shown to give better
results (up to 29% better in our experiments) if WordNet synsets are chosen as
the indexing space, instead of word forms. This result is obtained for a
manually disambiguated test collection (of queries and documents) derived from
the Semcor semantic concordance. The sensitivity of retrieval performance to
(automatic) disambiguation errors when indexing documents is also measured.
Finally, it is observed that if queries are not disambiguated, indexing by
synsets performs (at best) only as good as standard word indexing.Comment: 7 pages, LaTeX2e, 3 eps figures, uses epsfig, colacl.st
Extending, trimming and fusing WordNet for technical documents
This paper describes a tool for the automatic
extension and trimming of a multilingual
WordNet database for cross-lingual retrieval
and multilingual ontology building in
intranets and domain-specific document
collections. Hierarchies, built from
automatically extracted terms and combined
with the WordNet relations, are trimmed
with a disambiguation method based on the
document salience of the words in the
glosses. The disambiguation is tested in a
cross-lingual retrieval task, showing
considerable improvement (7%-11%). The
condensed hierarchies can be used as
browse-interfaces to the documents
complementary to retrieval
Using Cross-Lingual Explicit Semantic Analysis for Improving Ontology Translation
Semantic Web aims to allow machines to make inferences using the explicit conceptualisations contained in ontologies. By pointing to ontologies, Semantic Web-based applications are able to inter-operate and share common information easily. Nevertheless, multilingual semantic applications are still rare, owing to the fact that most online ontologies are monolingual in English. In order to solve this issue, techniques for ontology localisation and translation are needed. However, traditional machine translation is difficult to apply to ontologies, owing to the fact that ontology labels tend to be quite short in length and linguistically different from the free text paradigm. In this paper, we propose an approach to enhance machine translation of ontologies based on exploiting the well-structured concept descriptions contained in the ontology. In particular, our approach leverages the semantics contained in the ontology by using Cross Lingual Explicit Semantic Analysis (CLESA) for context-based disambiguation in phrase-based Statistical Machine Translation (SMT). The presented work is novel in the sense that application of CLESA in SMT has not been performed earlier to the best of our knowledge
BIKE: Bilingual Keyphrase Experiments
This paper presents a novel strategy for translating lists
of keyphrases. Typical keyphrase lists appear in
scientific articles, information retrieval systems and
web page meta-data. Our system combines a statistical
translation model trained on a bilingual corpus of
scientific papers with sense-focused look-up in a large
bilingual terminological resource. For the latter,
we developed a novel technique that benefits from viewing
the keyphrase list as contextual help for sense
disambiguation. The optimal combination of modules was
discovered by a genetic algorithm. Our work applies to
the French / English language pair
Formal models, usability and related work in IR (editorial for special edition)
The Glasgow IR group has carried out both theoretical and empirical work, aimed at giving end users efficient and effective access to large collections of multimedia data
A derivational rephrasing experiment for question answering
In Knowledge Management, variations in information expressions have proven a
real challenge. In particular, classical semantic relations (e.g. synonymy) do
not connect words with different parts-of-speech. The method proposed tries to
address this issue. It consists in building a derivational resource from a
morphological derivation tool together with derivational guidelines from a
dictionary in order to store only correct derivatives. This resource, combined
with a syntactic parser, a semantic disambiguator and some derivational
patterns, helps to reformulate an original sentence while keeping the initial
meaning in a convincing manner This approach has been evaluated in three
different ways: the precision of the derivatives produced from a lemma; its
ability to provide well-formed reformulations from an original sentence,
preserving the initial meaning; its impact on the results coping with a real
issue, ie a question answering task . The evaluation of this approach through a
question answering system shows the pros and cons of this system, while
foreshadowing some interesting future developments
- âŠ