42 research outputs found

    Customizing an English-Korean Machine Translation System for Patent/Technical Documents Translation

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Customizing an English-Korean Machine Translation System for Patent Translation

    Get PDF
    PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

    JTEC panel report on machine translation in Japan

    Get PDF
    The goal of this report is to provide an overview of the state of the art of machine translation (MT) in Japan and to provide a comparison between Japanese and Western technology in this area. The term 'machine translation' as used here, includes both the science and technology required for automating the translation of text from one human language to another. Machine translation is viewed in Japan as an important strategic technology that is expected to play a key role in Japan's increasing participation in the world economy. MT is seen in Japan as important both for assimilating information into Japanese as well as for disseminating Japanese information throughout the world. Most of the MT systems now available in Japan are transfer-based systems. The majority of them exploit a case-frame representation of the source text as the basis of the transfer process. There is a gradual movement toward the use of deeper semantic representations, and some groups are beginning to look at interlingua-based systems

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Idiom treatment experiments in machine translation

    Get PDF
    Idiomatic expressions pose a particular challenge for the today\u27;s Machine Translation systems, because their translation mostly does not result literally, but logically. The present dissertation shows, how with the help of a corpus, and morphosyntactic rules, such idiomatic expressions can be recognized and finally correctly translated. The work leads the reader in the first chapter generally to the field of Machine Translation and following that, it focuses on the special field of Example-based Machine Translation. Next, an important part of the doctoral thesis dissertation is devoted to the theory of idiomatic expressions. The practical part of the thesis describes how the hybrid Example-based Machine Translation system METIS-II, with the help of morphosyntactic rules, is able to correctly process certain idiomatic expressions and finally, to translate them. The following chapter deals with the function of the transfer system CAT2 and its handling of the idiomatic expressions. The last part of the thesis includes the evaluation of three commercial systems, namely SYSTRAN, T1 Langenscheidt, and Power Translator Pro, with respect to continuous and discontinuous idiomatic expressions. For this, both small corpora and a part of the extensive corpus Europarl and the Digital Lexicon of the German Language in 20th century were processed, firstly manually and then automatically. The dissertation concludes with results from this evaluation.Idiomatische Redewendungen stellen für heutige maschinelle Übersetzungssysteme eine besondere Herausforderung dar, da ihre Übersetzung nicht wörtlich, sondern stets sinngemäß erfolgen muss. Die vorliegende Dissertation zeigt, wie mit Hilfe eines Korpus sowie morphosyntaktischer Regeln solche idiomatische Redewendungen erkannt und am Ende richtig übersetzt werden können. Die Arbeit führt den Leser im ersten Kapitel allgemein in das Gebiet der Maschinellen Übersetzung ein und vertieft im Anschluss daran das Spezialgebiet der Beispielbasierten Maschinellen Übersetzung. Im Folgenden widmet sich ein wesentlicher Teil der Doktorarbeit der Theorie über idiomatische Redewendungen. Der praktische Teil der Arbeit beschreibt wie das hybride Beispielbasierte Maschinelle Übersetzungssystem METIS-II mit Hilfe von morphosyntaktischen Regeln befähigt wurde, bestimmte idiomatische Redewendungen korrekt zu bearbeiten und am Ende zu übersetzen. Das nachfolgende Kapitel behandelt die Funktion des Transfersystems CAT2 und dessen Umgang mit idiomatischen Wendungen. Der letzte Teil der Arbeit beinhaltet die Evaluation von drei kommerzielle Systemen, nämlich SYSTRAN, T1 Langenscheidt und Power Translator Pro, in Bezug auf deren Umgang mit kontinuierlichen und diskontinuierlichen idiomatischen Redewendungen. Hierzu wurden sowohl kleine Korpora als auch ein Teil des umfangreichen Korpus Europarl und des Digatalen Wörterbuchs der deutschen Sprache des 20. Jh. erst manuell und dann maschinell bearbeitet. Die Dissertation wird mit Folgerungen aus der Evaluation abgeschlossen

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio
    corecore