220 research outputs found

    Seeking health content online: a survey of Internet users’ habits and needs

    Get PDF
    This paper describes a small-scale survey conducted among non-native speakers of English living in Ireland. We collected data from 86 respondents by means of an online questionnaire. Our goal was to investigate their health information seeking behaviour; potential comprehension issues with health content; and their adoption of machine translation (MT) systems. We found that: the Internet is widely used for health-related searches; Wikipedia is the most consulted website; and information on illnesses and public health threats is frequently sought. We observed that the language in which online searches are conducted is influenced by respondents’ self-reported level of English proficiency, with most limited English proficiency (LEP) Internet users looking for health information in their native languages to facilitate comprehension. We also observed that specialised medical vocabulary might hinder comprehension. Finally, most participants reported adopting MT to translate online health content from English into their native languages, and LEP respondents reported using MT more frequently than proficient respondents. This survey highlights the need to increase the accessibility of online health content for nonnative speakers of English (especially LEP users) with a view to reducing their vulnerability. We argue that text simplification might lead to the production of more comprehensible and more machine translatable health-related texts

    Machine translation evaluation resources and methods: a survey

    Get PDF
    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

    A deep learning approach to bilingual lexicon induction in the biomedical domain.

    Get PDF
    BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations. RESULTS: The results show that the word-level and character-level representations each improve state-of-the-art results for BLI and biomedical translation mining. The best results are obtained by exploiting the synergy between these word-level and character-level representations in the classification model. We evaluate the models both quantitatively and qualitatively. CONCLUSIONS: Translation of domain-specific biomedical terminology benefits from the character-level representations compared to relying solely on word-level representations. It is beneficial to take a deep learning approach and learn character-level representations rather than relying on handcrafted representations that are typically used. Our combined model captures the semantics at the word level while also taking into account that specialized terminology often originates from a common root form (e.g., from Greek or Latin)

    Creación de un motor de traducción automática estadístico (EN>ES) para textos del ámbito farmacéutico. Comparación con otros motores de traducción automática neuronal existentes

    Get PDF
    En aquest treball de fi de màster es duu a terme la creació d'un motor de traducció automàtica estadística (EN>ES) especialitzat en l'àmbit farmacèutic mitjançant la plataforma KantanMT. S'ofereixen pinzellades dels detalls clau dels diferents sistemes de traducció automàtica més populars, així com també es parla de la importància de la posedició en el món de la TA i de l'ús de sistemes TAE en petites i mitjanes empreses de traducció espanyoles. D'altra banda, es mostren, a més, els passos a seguir a l'hora d'entrenar un motor de traducció automàtica estadística propi en el núvol, incloent el procés de cerca i creació de corpus. D'aquesta forma, es pretén comprovar si un motor de traducció automàtica estadística especialitzat en l'àmbit farmacèutic ofereix millors resultats en aquest àmbit d'especialització que alguns dels motors de traducció automàtica neuronals de caràcter genèric disponibles en la web. Aquesta comparació ve donada per la creixent popularitat que ha guanyat la traducció automàtica neuronal en els últims anys.En este trabajo de fin de máster se lleva a cabo la creación de un motor de traducción automática estadística (EN>ES) especializado en el ámbito farmacéutico mediante la plataforma KantanMT. Se ofrecen pinceladas de los detalles clave de los diferentes sistemas de traducción automática más populares, así como también se habla de la importancia de la posedición en el mundo de la TA y del uso de sistemas TAE en pequeñas y medianas empresas de traducción españolas. Por otro lado, se muestran, además, los pasos a seguir a la hora de entrenar un motor de traducción automática estadística propio en la nube, incluyendo el proceso de búsqueda y creación de corpus. De esta forma, se pretende comprobar si un motor de traducción automática estadística especializado en el ámbito farmacéutico ofrece mejores resultados en dicho ámbito de especialización que algunos de los motores de traducción automática neuronales de carácter general disponibles en la web. Esta comparación viene dada por la creciente popularidad que ha ganado la traducción automática neuronal en los últimos años.The aim of this Master's Degree Dissertation is the creation of a statistical machine translation engine (EN>ES) specialised in the pharmaceutical field by means of the KantanMT platform. It presents the key details of the different most popular machine translation systems, as well as the importance of post-editing in the world of MT and the use of MT systems in small and medium-sized Spanish translation companies. On the other hand, it also shows the steps to follow when training a statistical machine translation engine in the cloud, including the corpus search and creation processes. In this way, the aim is to check whether a statistical machine translation engine specialising in the pharmaceutical field offers better results in this area of specialisation than some of the generic neuronal machine translation engines available on the web. This comparison is due to the growing popularity of neural machine translation in recent years
    corecore