163 research outputs found

    Empirical studies on word representations

    Get PDF
    One of the most fundamental tasks in natural language processing is representing words with mathematical objects (such as vectors). The word representations, which are most often estimated from data, allow capturing the meaning of words. They enable comparing words according to their semantic similarity, and have been shown to work extremely well when included in complex real-world applications. A large part of our work deals with ways of estimating word representations directly from large quantities of text. Our methods exploit the idea that words which occur in similar contexts have a similar meaning. How we define the context is an important focus of our thesis. The context can consist of a number of words to the left and to the right of the word in question, but, as we show, obtaining context words via syntactic links (such as the link between the verb and its subject) often works better. We furthermore investigate word representations that accurately capture multiple meanings of a single word. We show that translation of a word in context contains information that can be used to disambiguate the meaning of that word

    Empirical studies on word representations

    Get PDF

    Empirical studies on word representations

    Get PDF

    Creación de datos multilingües para diversos enfoques basados en corpus en el ámbito de la traducción y la interpretación

    Get PDF
    Accordingly, this research work aims at exploiting and developing new technologies and methods to better ascertain not only translators’ and interpreters’ needs, but also professionals’ and ordinary people’s on their daily tasks, such as corpora and terminology compilation and management. The main topics covered by this work relate to Computational Linguistics (CL), Natural Language Processing (NLP), Machine Translation (MT), Comparable Corpora, Distributional Similarity Measures (DSM), Terminology Extraction Tools (TET) and Terminology Management Tools (TMT). In particular, this work examines three main questions: 1) Is it possible to create a simpler and user-friendly comparable corpora compilation tool? 2) How to identify the most suitable TMT and TET for a given translation or interpreting task? 3) How to automatically assess and measure the internal degree of relatedness in comparable corpora? This work is composed of thirteen peer-reviewed scientific publications, which are included in Appendix A, while the methodology used and the results obtained in these studies are summarised in the main body of this document. Fecha de lectura de Tesis Doctoral: 22 de noviembre 2019Corpora are playing an increasingly important role in our multilingual society. High-quality parallel corpora are a preferred resource in the language engineering and the linguistics communities. Nevertheless, the lack of sufficient and up-to-date parallel corpora, especially for narrow domains and poorly-resourced languages is currently one of the major obstacles to further advancement across various areas like translation, language learning and, automatic and assisted translation. An alternative is the use of comparable corpora, which are easier and faster to compile. Corpora, in general, are extremely important for tasks like translation, extraction, inter-linguistic comparisons and discoveries or even to lexicographical resources. Its objectivity, reusability, multiplicity and applicability of uses, easy handling and quick access to large volume of data are just an example of their advantages over other types of limited resources like thesauri or dictionaries. By a way of example, new terms are coined on a daily basis and dictionaries cannot keep up with the rate of emergence of new terms

    Collaborative interdisciplinary publication skills education: implementation and implications in international science research contexts.

    Get PDF
    This portfolio of three research projects addresses at an educational level the increasing pressure on scientists internationally to publish research in highly-ranked, peer-reviewed journals, and thus in English. Building on a tradition of collaboration between language- and content-based expertise in English for Specific/Academic Purposes, the portfolio examines the contribution of a pedagogical approach dubbed Collaborative Interdisciplinary Publication Skills Education (CIPSE) for teaching novice scientist authors who use English as a first or additional language. Project 1 examines CIPSE development from its antecedents in content-based learning and genre analysis, culminating in the production of a teaching text/website package Writing Scientific Research Articles: Strategy and Steps (WSRA) by a collaborative team of the candidate, an applied linguist, and a publishing, refereeing scientist. The aim was to redress the incomplete coverage of existing approaches to produce a resource accessible to novice authors of all language backgrounds and to teachers/mentors within both science and language contexts. The research questions driving Projects 2 and 3 emerged from initial implementation of CIPSE, and were addressed by analyzing evaluative data from selected implementation sites. Project 2 investigates interdisciplinary teams for publication skills development. Part A, framed within the constructs of interdisciplinary higher education, demonstrates that the CIPSE structure, led by an applied linguist working with interdisciplinary collaborators as appropriate/available in each presentation context, was effective at all levels of collaboration. It was important that CIPSE outcomes were 'core business' for collaborators, and a need was identified for terminology that intersects with the agendas of those with power to implement. Part B, framed within English for Specific Purposes, focuses on challenges to interdisciplinary collaboration in China. Recommended strategies for developing collaboration between Chinese scientists and English-language professionals, rather than foreign visitors, include institutional support for collaboration, and training to enhance the ability of English professionals to present themselves as bringing valuable expertise to publication skills education. Project 3 investigates CIPSE effectiveness for Chinese scientists at different career stages. Part A, addressing academic writing instruction, highlights challenges to publication success for EFL (English as a Foreign Language) science researchers as identified by CIPSE workshop participants. Introducing the WSRA package to Chinese scientists who train/mentor students resulted in significantly increased confidence both to write/publish their own articles and to teach others, and a shift in the training methods deemed appropriate. Part B analyses a 4-cycle action research study at the Graduate University of the Chinese Academy of Sciences, Beijing, 2006-9, to investigate use of CIPSE in an EFL university with early-candidature students from mixed disciplines. The resulting adapted, CIPSE-based course shows potential for use by Chinese teachers. Taken together, the three projects provide a theorised basis and practical steps for building effective training regimes for publication skill development in a wide range of science research contexts. Overall findings are summarised as a matrix of descriptor scales for analysing training contexts to identify cost-effective levels of collaboration: client training goals, trainee research experience, training program type, and English language context. The portfolio findings thus contribute to knowledge of interdisciplinary collaboration in education and context-sensitive implementation of educational innovation.Thesis (D.Ed.) -- University of Adelaide, School of Education, 201
    • …
    corecore