93,497 research outputs found

    Report of MIRACLE team for the Ad-Hoc track in CLEF 2007

    Get PDF
    This paper presents the 2007 MIRACLE’s team approach to the AdHoc Information Retrieval track. The work carried out for this campaign has been reduced to monolingual experiments, in the standard and in the robust tracks. No new approaches have been attempted in this campaign, following the procedures established in our participation in previous campaigns. For this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, Hungarian, and Czech. - Robust monolingual: French, English and Portuguese. There is still some room for improvement around multilingual named entities recognition

    Semi-Supervised Learning for Neural Machine Translation

    Full text link
    While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder, in which the source-to-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the Chinese-English dataset show that our approach achieves significant improvements over state-of-the-art SMT and NMT systems.Comment: Corrected a typ

    Joint Training for Neural Machine Translation Models with Monolingual Data

    Full text link
    Monolingual data have been demonstrated to be helpful in improving translation quality of both statistical machine translation (SMT) systems and neural machine translation (NMT) systems, especially in resource-poor or domain adaptation tasks where parallel data are not rich enough. In this paper, we propose a novel approach to better leveraging monolingual data for neural machine translation by jointly learning source-to-target and target-to-source NMT models for a language pair with a joint EM optimization method. The training process starts with two initial NMT models pre-trained on parallel data for each direction, and these two models are iteratively updated by incrementally decreasing translation losses on training data. In each iteration step, both NMT models are first used to translate monolingual data from one language to the other, forming pseudo-training data of the other NMT model. Then two new NMT models are learnt from parallel data together with the pseudo training data. Both NMT models are expected to be improved and better pseudo-training data can be generated in next step. Experiment results on Chinese-English and English-German translation tasks show that our approach can simultaneously improve translation quality of source-to-target and target-to-source models, significantly outperforming strong baseline systems which are enhanced with monolingual data for model training including back-translation.Comment: Accepted by AAAI 201

    Towards Language-Universal End-to-End Speech Recognition

    Full text link
    Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters. In this work, we exploit recent progress in end-to-end speech recognition to create a single multilingual speech recognition system capable of recognizing any of the languages seen in training. To do so, we propose the use of a universal character set that is shared among all languages. We also create a language-specific gating mechanism within the network that can modulate the network's internal representations in a language-specific way. We evaluate our proposed approach on the Microsoft Cortana task across three languages and show that our system outperforms both the individual monolingual systems and systems built with a multi-task learning approach. We also show that this model can be used to initialize a monolingual speech recognizer, and can be used to create a bilingual model for use in code-switching scenarios.Comment: submitted to ICASSP 201

    Psychotherapy across languages: beliefs, attitudes and practices of monolingual and multilingual therapists with their multilingual patients

    Get PDF
    The present study investigates beliefs, attitudes and practices of 101 monolingual and multilingual therapists in their interactions with multilingual patients. We adopted a mixed-method approach, using an on-line questionnaire with 27 closed questions which were analysed quantitatively and informed questions in interviews with one monolingual and two multilingual therapists. A principal component analysis yielded a four-factor solution accounting for 41% of the variance. The first dimension, which explained 17% of variance, reflects therapists’ attunement towards their bilingual patients (i.e., attunement versus collusion). Further analysis showed that the 18 monolingual therapists differed significantly from their 83 bi- or multilingual peers on this dimension. The follow up interviews confirmed this result. Recommendations based on these findings are made for psychotherapy training and supervision to attend to a range of issues including: the psychological and therapeutic functions of multi/bilingualism; practice in making formulations in different languages; the creative therapeutic potential of the language gap

    How much exposure to English is necessary for a bilingual toddler to perform like a monolingual peer in language tests?

    Get PDF
    Background Bilingual children are under-referred due to an ostensible expectation that they lag behind their monolingual peers in their English acquisition. The recommendations of the Royal College of Speech and Language Therapists (RCSLT) state that bilingual children should be assessed in both the languages known by the children. However, despite these recommendations, a majority of speech and language professionals report that they assess bilingual children only in English as bilingual children come from a wide array of language backgrounds and standardized language measures are not available for the majority of these. Moreover, even when such measures do exist, they are not tailored for bilingual children. Aims It was asked whether a cut-off exists in the proportion of exposure to English at which one should expect a bilingual toddler to perform as well as a monolingual on a test standardized for monolingual English-speaking children. Methods & Procedures Thirty-five bilingual 2;6-year-olds exposed to British English plus an additional language and 36 British monolingual toddlers were assessed on the auditory component of the Preschool Language Scale, British Picture Vocabulary Scale and an object-naming measure. All parents completed the Oxford Communicative Development Inventory (Oxford CDI) and an exposure questionnaire that assessed the proportion of English in the language input. Where the CDI existed in the bilingual's additional language, these data were also collected. Outcomes & Results Hierarchical regression analyses found the proportion of exposure to English to be the main predictor of the performance of bilingual toddlers. Bilingual toddlers who received 60% exposure to English or more performed like their monolingual peers on all measures. K-means cluster analyses and Levene variance tests confirmed the estimated English exposure cut-off at 60% for all language measures. Finally, for one additional language for which we had multiple participants, additional language CDI production scores were significantly inversely related to the amount of exposure to English. Conclusions & Implications Typically developing 2;6-year-olds who are bilingual in English and an additional language and who hear English 60% of the time or more, perform equivalently to their typically developing monolingual peers

    Dublin City University at CLEF 2004: experiments in monolingual, bilingual and multilingual retrieval

    Get PDF
    The Dublin City University group participated in the monolingual, bilingual and multilingual retrieval tasks this year. The main focus of our investigation this year was extending our retrieval system to document languages other than English, and completing the multilingual task comprising four languages: English, French, Russian and Finnish. Results from our French monolingual experiments indicate that working in French is more effective for retrieval than adopting document and topic translation to English. However, comparison of our multilingual retrieval results using different topic and document translation reveals that this result does not extend to retrieved list merging for the multilingual task in a simple predictable way
    corecore