10,390 research outputs found

    MIRACLE evaluation of results for ImageCLEF 2003

    Get PDF
    ImageCLEF is a new pilot experiment introduced in CLEF 2003. It is devoted to the cross language retrieval of images using textual descriptions related to images contents. This paper presents MIRACLE research team experiments and results obtained for this track

    MIRACLE Retrieval Experiments with East Asian Languages

    Get PDF
    This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    Domain-specific query translation for multilingual access to digital libraries

    Get PDF
    Accurate high-coverage translation is a vital component of reliable cross language information access (CLIR) systems. This is particularly true of access to archives such as Digital Libraries which are often specific to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in information retrieval evaluation workshops, it is not well suited to specialized tasks where domain specific translations are required. We demonstrate that effective query translation in the domain of cultural heritage (CH) can be achieved by augmenting a standard MT system with domain-specific phrase dictionaries automatically mined from the online Wikipedia. Experiments using our hybrid translation system with sample query logs from users of CH websites demonstrate a large improvement in the accuracy of domain specific phrase detection and translation

    On the Reproducibility and Generalisation of the Linear Transformation of Word Embeddings

    Get PDF
    Linear transformation is a way to learn a linear relationship between two word embeddings, such that words in the two different embedding spaces can be semantically related. In this paper, we examine the reproducibility and generalisation of the linear transformation of word embeddings. Linear transformation is particularly useful when translating word embedding models in different languages, since it can capture the semantic relationships between two models. We first reproduce two linear transformation approaches, a recent one using orthogonal transformation and the original one using simple matrix transformation. Previous findings on a machine translation task are re-examined, validating that linear transformation is indeed an effective way to transform word embedding models in different languages. In particular, we show that the orthogonal transformation can better relate the different embedding models. Following the verification of previous findings, we then study the generalisation of linear transformation in a multi-language Twitter election classification task. We observe that the orthogonal transformation outperforms the matrix transformation. In particular, it significantly outperforms the random classifier by at least 10% under the F1 metric across English and Spanish datasets. In addition, we also provide best practices when using linear transformation for multi-language Twitter election classification

    GeoCLEF 2006: the CLEF 2006 Ccross-language geographic information retrieval track overview

    Get PDF
    After being a pilot track in 2005, GeoCLEF advanced to be a regular track within CLEF 2006. The purpose of GeoCLEF is to test and evaluate cross-language geographic information retrieval (GIR): retrieval for topics with a geographic specification. For GeoCLEF 2006, twenty-five search topics were defined by the organizing groups for searching English, German, Portuguese and Spanish document collections. Topics were translated into English, German, Portuguese, Spanish and Japanese. Several topics in 2006 were significantly more geographically challenging than in 2005. Seventeen groups submitted 149 runs (up from eleven groups and 117 runs in GeoCLEF 2005). The groups used a variety of approaches, including geographic bounding boxes, named entity extraction and external knowledge bases (geographic thesauri and ontologies and gazetteers)

    Frequency drives lexical access in reading but not in speaking: the frequency-lag hypothesis

    Get PDF
    To contrast mechanisms of lexical access in production versus comprehension we compared the effects of word frequency (high, low), context (none, low constraint, high constraint), and level of English proficiency (monolingual, Spanish-English bilingual, Dutch-English bilingual) on picture naming, lexical decision, and eye fixation times. Semantic constraint effects were larger in production than in reading. Frequency effects were larger in production than in reading without constraining context but larger in reading than in production with constraining context. Bilingual disadvantages were modulated by frequency in production but not in eye fixation times, were not smaller in low-constraint contexts, and were reduced by high-constraint contexts only in production and only at the lowest level of English proficiency. These results challenge existing accounts of bilingual disadvantages and reveal fundamentally different processes during lexical access across modalities, entailing a primarily semantically driven search in production but a frequency-driven search in comprehension. The apparently more interactive process in production than comprehension could simply reflect a greater number of frequency-sensitive processing stages in production
    corecore