Search CORE

4,294 research outputs found

Which User Interaction for Cross-Language Information Retrieval? Design Issues and Reflections

Author: Baker
Ballesteros
Ballesteros
Beaulieu
Belkin
Bian
Borlund
Capstick
Capstick
Chin
Demetriou
Dorr
Dumas
Dunlop
Eco
Ericsson
Gonzalo
Hackos
Harston
He
Hull
Koenemann
Levin
Levin
Lopez-Ostenero
McCarley
Mizzaro
Oard
Oard
Ogden
Ogden
Penas
Petrelli
Petrelli
Pirkola
Preece
Robertson
Salton
Saracevic
Snyder
Van Welie
Xu
Yunker
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. This paper presents three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for rare languages, and shows how the user interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focussed on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users

PRIME: A System for Multi-lingual Patent Retrieval

Author: Fujii Atsushi
Fukui Masatoshi
Higuchi Shigeto
Ishikawa Tetsuya
Publication venue
Publication date: 01/01/2001
Field of study

Given the growing number of patents filed in multiple countries, users are interested in retrieving patents across languages. We propose a multi-lingual patent retrieval system, which translates a user query into the target language, searches a multilingual database for patents relevant to the query, and improves the browsing efficiency by way of machine translation and clustering. Our system also extracts new translations from patent families consisting of comparable patents, to enhance the translation dictionary

arXiv.org e-Print Archive

CiteSeerX

Towards a Universal Wordnet by Learning from Combined Evidenc

Author: de Melo G.
Weikum G.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2009
Field of study

Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification

Latent sentiment model for weakly-supervised cross-lingual sentiment classification

Author: He Yulan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper, we present a novel weakly-supervised method for crosslingual sentiment analysis. In specific, we propose a latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon, the LSM model is able to extract highly domainspecific polarity words from text

CiteSeerX

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

Author: Agnes Frederic
Besacier Laurent
Ferrero Jeremy
Schwab Didier
Publication venue
Publication date: 01/01/2017
Field of study

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes