865 research outputs found
Sheffield University CLEF 2000 submission - bilingual track: German to English
We investigated dictionary based cross language information
retrieval using lexical triangulation. Lexical triangulation combines the results
of different transitive translations. Transitive translation uses a pivot language
to translate between two languages when no direct translation resource is
available. We took German queries and translated then via Spanish, or Dutch
into English. We compared the results of retrieval experiments using these
queries, with other versions created by combining the transitive translations or
created by direct translation. Direct dictionary translation of a query introduces
considerable ambiguity that damages retrieval, an average precision 79% below
monolingual in this research. Transitive translation introduces more ambiguity,
giving results worse than 88% below direct translation. We have shown that
lexical triangulation between two transitive translations can eliminate much of
the additional ambiguity introduced by transitive translation
MIRACLE’s hybrid approach to bilingual and monolingual Information Retrieval
The main goal of the bilingual and monolingual participation of the MIRACLE team at CLEF 2004 was testing the effect of combination approaches to information retrieval. The starting point is a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components are used in different combinations and order of application for document indexing and for query processing. Besides this, a second order combination is done, mainly by averaging or by selective combination of the documents retrieved by different approaches for a particular query
How effective is stemming and decompounding for German text retrieval?
Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch
An Investigation on Text-Based Cross-Language Picture Retrieval Effectiveness through the Analysis of User Queries
Purpose: This paper describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. Italian speaking users generated 618 queries for a set of known-item search tasks. The queries generated by user’s interaction with the system have been analysed and the results used to suggest recommendations for the future development of cross-language retrieval systems for digital image libraries.
Methodology: A controlled lab-based user study was carried out using a prototype Italian-English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known-item search task. User’s interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively.
Findings: Results highlight the diversity in requests for similar visual content and the weaknesses of Machine Translation for query translation. Through the manual translation of queries we show the benefits of using high-quality translation resources. The results show the individual characteristics of user’s whilst performing known-item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user’s search terms for objects within the foreground of an image.
Limitations and Implications: This research looks in-depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repository.
Value: The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross-language information access services. However, to develop effective systems requires studying user’s search behaviours, particularly in digital image libraries. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross-language image retrieval system design.</p
Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, has of late become one of the major topics within the
information retrieval community. This paper proposes a Japanese/English CLIR
system, where we combine a query translation and retrieval modules. We
currently target the retrieval of technical documents, and therefore the
performance of our system is highly dependent on the quality of the translation
of technical terms. However, the technical term translation is still
problematic in that technical terms are often compound words, and thus new
terms are progressively created by combining existing base words. In addition,
Japanese often represents loanwords based on its special phonogram.
Consequently, existing dictionaries find it difficult to achieve sufficient
coverage. To counter the first problem, we produce a Japanese/English
dictionary for base words, and translate compound words on a word-by-word
basis. We also use a probabilistic method to resolve translation ambiguity. For
the second problem, we use a transliteration method, which corresponds words
unlisted in the base word dictionary to their phonetic equivalents in the
target language. We evaluate our system using a test collection for CLIR, and
show that both the compound word translation and transliteration methods
improve the system performance
Integrating question answering and text-to-SQL in Portuguese
Deep learning transformers have drastically improved systems that
automatically answer questions in natural language. However, different
questions demand different answering techniques; here we propose, build and
validate an architecture that integrates different modules to answer two
distinct kinds of queries. Our architecture takes a free-form natural language
text and classifies it to send it either to a Neural Question Answering
Reasoner or a Natural Language parser to SQL. We implemented a complete system
for the Portuguese language, using some of the main tools available for the
language and translating training and testing datasets. Experiments show that
our system selects the appropriate answering method with high accuracy (over
99\%), thus validating a modular question answering strategy.Comment: Published at International Conference on the Computational Processing
of Portuguese (PROPOR 2022
- …