856 research outputs found

    Uvid u automatsko izlučivanje metaforičkih kolokacija

    Get PDF
    Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su već dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju čine metaforičke kolokacije. Kod metaforičkih je kolokacija kod jedne od sastavnica došlo do semantičkoga pomaka, tj. jedna od sastavnica poprima preneseno značenje. Glavni su ciljevi ovoga rada istražiti postojeću literaturu te dati sustavan pregled postojećih istraživanja na temu izlučivanja kolokacija i postojećih metoda, mjera i resursa. Postojeća istraživanja opisana su i klasificirana prema različitim pristupima (statistički, hibridni i zasnovani na distribucijskoj semantici). Također su opisane različite asocijativne mjere i postojeći načini procjene rezultata automatskoga izlučivanja kolokacija. Metode, alati i resursi koji su korišteni u prethodnim istraživanjima, a mogli bi biti korisni za naš budući rad posebno su istaknuti. Stečeni uvidi u postojeća istraživanja čine prvi korak u razmatranju mogućnosti razvijanja postupka za automatsko izlučivanje metaforičkih kolokacija

    Computer Assisted Language Learning Based on Corpora and Natural Language Processing : The Experience of Project CANDLE

    Get PDF
    This paper describes Project CANDLE, an ongoing 3-year project which uses various corpora and NLP technologies to construct an online English learning environment for learners in Taiwan. This report focuses on the interim results obtained in the first eighteen months. First, an English-Chinese parallel corpus, Sinorama, was used as the main course material for reading, writing, and culture-based learning courses. Second, an online bilingual concordancer, TotalRecall, and a collocation reference tool, TANGO, were developed based on Sinorama and other corpora. Third, many online lessons, including extensive reading, verb-noun collocations, and vocabulary, were designed to be used alone or together with TotalRecall and TANGO. Fourth, an online collocation check program, MUST, was developed for detecting V-N miscollocation and suggesting adequate collocates in student’s writings based on the hypothesis of L1 interference and the database of BNC and the bilingual Sinorama Corpus. Other computational scaffoldings are under development. It is hoped that this project will help intermediate learners in Taiwan enhance their English proficiency with effective pedagogical approaches and versatile language reference tools

    Comparing collocations in translated and learner language

    Get PDF
    This paper compares use of collocations by Italian learners writing in and translating into English, conceptualising the two tasks as different modes of constrained language production and adopting Halverson’s (2017) Revised Gravitational Pull hypothesis as a theoretical model. A particular focus is placed on identifying a method for comparing datasets containing translations and essays, assembled opportunistically and varying in size and structure. The study shows that lexical association scores for dependency-defined word pairs are significantly higher in translations than essays. A qualitative analysis of a subset of collocations shared and unique to either mode shows that the former set features more collocations with direct cross-linguistic links (connectivity), and that the source/first language seems to affect both modes similarly. We tentatively conclude that second/target language salience effects are more visible in translation than second language use, while connectivity and source language salience affect both modes of bilingual processing similarly, regardless of the mediation variable

    Representation and parsing of multiword expressions

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

    Current trends

    Get PDF
    Deep parsing is the fundamental process aiming at the representation of the syntactic structure of phrases and sentences. In the traditional methodology this process is based on lexicons and grammars representing roughly properties of words and interactions of words and structures in sentences. Several linguistic frameworks, such as Headdriven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different structures and combining operations for building grammar rules. These already contain mechanisms for expressing properties of Multiword Expressions (MWE), which, however, need improvement in how they account for idiosyncrasies of MWEs on the one hand and their similarities to regular structures on the other hand. This collaborative book constitutes a survey on various attempts at representing and parsing MWEs in the context of linguistic theories and applications

    Multiword expressions at length and in depth

    Get PDF
    The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

    Translating English verbal collocations into Spanish: On distribution and other relevant differences related to diatopic variation

    Get PDF
    Language varieties should be taken into account in order to enhance fluency and naturalness of translated texts. In this paper we will examine the collocational verbal range for prima-facie translation equivalents of words like decision and dilemma, which in both languages denote the act or process of reaching a resolution after consideration, resolving a question or deciding something. We will be mainly concerned with diatopic variation in Spanish. To this end, we set out to develop a giga-token corpus-based protocol which includes a detailed and reproducible methodology sufficient to detect collocational peculiarities of transnational languages. To our knowledge, this is one of the first observational studies of this kind. The paper is organised as follows. Section 1 introduces some basic issues about the translation of collocations against the background of languages’ anisomorphism. Section 2 provides a feature characterisation of collocations. Section 3 deals with the choice of corpora, corpus tools, nodes and patterns. Section 4 covers the automatic retrieval of the selected verb + noun (object) collocations in general Spanish and the co-existing national varieties. Special attention is paid to comparative results in terms of similarities and mismatches. Section 5 presents conclusions and outlines avenues of further research.Published versio

    Translating Collocations for Bilingual Lexicons: A Statistical Approach

    Get PDF
    Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and cannot be translated on a word-by-word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations. Our goal is to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains. The algorithm we use is based on statistical methods and produces p-word translations of n-word collocations in which n and p need not be the same. For example, Champollion translates make...decision, employment equity, and stock market into prendre...décision, équité en matière d'emploi, and bourse respectively. Testing Champollion on three years' worth of the Hansards corpus yielded the French translations of 300 collocations for each year, evaluated at 73% accuracy on average. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation
    corecore