10,390 research outputs found
MIRACLE evaluation of results for ImageCLEF 2003
ImageCLEF is a new pilot experiment introduced in CLEF 2003. It is devoted to the cross language retrieval of images using textual descriptions related to images contents. This paper
presents MIRACLE research team experiments and results obtained for this track
MIRACLE Retrieval Experiments with East Asian Languages
This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations
Domain-specific query translation for multilingual access to digital libraries
Accurate high-coverage translation is a vital component of reliable cross language information access (CLIR) systems. This is particularly true of access to archives such as Digital Libraries which are often specific to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in information retrieval evaluation workshops, it is not well suited to specialized tasks where domain specific translations are required. We demonstrate that effective query translation
in the domain of cultural heritage (CH) can be achieved by augmenting a standard MT system with domain-specific phrase dictionaries automatically mined from the online Wikipedia. Experiments using our hybrid translation system with sample query logs from users of CH websites demonstrate a large improvement in the accuracy of domain specific phrase detection and translation
On the Reproducibility and Generalisation of the Linear Transformation of Word Embeddings
Linear transformation is a way to learn a linear relationship between two word embeddings, such that words in the two different embedding spaces can be semantically related. In this paper, we examine the reproducibility and generalisation of the linear transformation of word embeddings. Linear transformation is particularly useful when translating word embedding models in different languages, since it can capture the semantic relationships between two models. We first reproduce two linear transformation approaches, a recent one using orthogonal transformation and the original one using simple matrix transformation. Previous findings on a machine translation task are re-examined, validating that linear transformation is indeed an effective way to transform word embedding models in different languages. In particular, we show that the orthogonal transformation can better relate the different embedding models. Following the verification of previous findings, we then study the generalisation of linear transformation in a multi-language Twitter election classification task. We observe that the orthogonal transformation outperforms the matrix transformation. In particular, it significantly outperforms the random classifier by at least 10% under the F1 metric across English and Spanish datasets. In addition, we also provide best practices when using linear transformation for multi-language Twitter election classification
Recommended from our members
Cognate facilitation effects in bilingual children of varying language dominance
A widely accepted theory is that bilinguals activate both of their languages regardless of which is in use. Though there is abundant research on this phenomenon in bilingual adults, less research has focused on bilingual children. Cognates (i.e., words that share meaning and sound across languages) have frequently been used to explore language co-activation. The present study investigates cognate facilitation effects in child bilinguals of varying language dominance. Spanish-English bilingual children between 6 and 10 years old performed a picture-naming task that included pictures of cognates and non-cognates. Children who were more English-dominant experienced larger cognate facilitation effects when producing words in their non-dominant language but not in their dominant language. In contrast, children with more balanced dominance did not experience cognate facilitation effects in either language. The findings from this study may have implications for the development of the bilingual lexicon.Psycholog
GeoCLEF 2006: the CLEF 2006 Ccross-language geographic information retrieval track overview
After being a pilot track in 2005, GeoCLEF advanced to be a regular track within CLEF 2006. The
purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR): retrieval for
topics with a geographic specification. For GeoCLEF 2006, twenty-five search topics were defined by the
organizing groups for searching English, German, Portuguese and Spanish document collections. Topics were
translated into English, German, Portuguese, Spanish and Japanese. Several topics in 2006 were significantly
more geographically challenging than in 2005. Seventeen groups submitted 149 runs (up from eleven groups and
117 runs in GeoCLEF 2005). The groups used a variety of approaches, including geographic bounding boxes,
named entity extraction and external knowledge bases (geographic thesauri and ontologies and gazetteers)
Frequency drives lexical access in reading but not in speaking: the frequency-lag hypothesis
To contrast mechanisms of lexical access in production versus comprehension we compared the effects of word frequency (high, low), context (none, low constraint, high constraint), and level of English proficiency (monolingual, Spanish-English bilingual, Dutch-English bilingual) on picture naming, lexical decision, and eye fixation times. Semantic constraint effects were larger in production than in reading. Frequency effects were larger in production than in reading without constraining context but larger in reading than in production with constraining context. Bilingual disadvantages were modulated by frequency in production but not in eye fixation times, were not smaller in low-constraint contexts, and were reduced by high-constraint contexts only in production and only at the lowest level of English proficiency. These results challenge existing accounts of bilingual disadvantages and reveal fundamentally different processes during lexical access across modalities, entailing a primarily semantically driven search in production but a frequency-driven search in comprehension. The apparently more interactive process in production than comprehension could simply reflect a greater number of frequency-sensitive processing stages in production
- …