    Taking on new challenges in multi-word unit processing for Machine Translation

    This paper discusses the qualitative comparative evaluation performed on the results of two machine translation systems with different approaches to the processing of multi-word units. It proposes a solution for overcoming the difficulties multi-word units present to machine translation by adopting a methodology that combines the lexicon grammar approach with OpenLogos ontology and semantico-syntactic rules. The paper also discusses the importance of a qualitative evaluation metrics to correctly evaluate the performance of machine translation engines with regards to multi-word units


    The aim of this paper is to provide an overview and the analysis of collocations, one of the most significant aspects of idiomatic use of language. A special emphasis has been put on a comparative review of the most common Light Verb Constructions consisting of light verbs (cro. lagani glagoli, ital. verbi supporto) and nouns in Croatian, English and Italian language. The aforementioned construction is chosen since it is extremely common in the early stages of language acquisition. Moreover, the aim of the conducted contrastive analysis has been to determine overlaps in order to use the examples of positive transfer in teaching lexis (English/Italian – L2), as well as to prevent negative interference such as false analogies. The research is based on the assumption that the number of completely concordant collocations taught in the early stages of foreign language acquisition is limited. Thus, prompt detection and putting emphasis on their relevance is essential. Following the discussion of the results of the contrastive analysis, the relevance of teaching collocations, i.e. presenting the most common collocations simultaneously with new vocabulary will be stressed. In accordance with the above­mentioned, we believe that collocational approach is the most useful and effective in teaching languages

    Mixed up with machine Translation: Multi-word Units Disambiguation Challenge.

    With the rapid evolution of the Internet, translation has become part of the daily life of ordinary users, not only of professional translators. Machine translation has evolved along with different types of computer-assisted translation tools. Qualitative progress has been made in the field of machine translation, but not all problems have been solved. The current times are auspicious for the development of more sophisticated evaluation tools that measure the performance of specific linguistic phenomena. One problem in particular, namely the poor analysis and translation of multi-word units, is an arena where investment in linguistic knowledge systems with the goal of improving machine translation would be beneficial. This paper addresses the difficulties multi-word units present to machine translation, by comparing translations performed by systems adopting different approaches to machine translation. It proposes a solution for improving the quality of the translation of multi-word units by adopting a methodology that combines Lexicon Grammar resources with OpenLogos lexical resources and semantico-syntactic rules. Finally, it discusses how an ideal machine translation evaluation tool might look to correctly evaluate the performance of machine translation engines with regards to multi-word units and thus to contribute to the improvement of translation quality


    Ovaj rad tematizira jedno od najvažnijih područja idiomatskog jezika, kolokacije. Posebna se pozornost pritom posvećuje komparativnom prikazu najčešćih konstrukcija sastavljenih od laganog glagola (tal. „verbi supporto“, engl. „light verbs“) i imenice u hrvatskom, engleskom i talijanskom jeziku. Ta je konstrukcija odabrana za analiziranje jer je iznimno česta u najranijem stupnju usvajanja jezika, a cilj usporedbe bio je odrediti podudarnosti u svrhu korištenja pozitivna prijenosa pri poučavanju leksika u talijanskom i engleskom kao stranom jeziku, odnosno spriječiti negativan prijenos u vidu pogrešnih analogija. Istraživanje se temelji na pretpostavci da je količina potpunih podudarnosti u kolokacijama koje se poučavaju u ranim stadijima učenja stranog jezika ograničena, stoga ih je potrebno pravodobno uočiti i svratiti na njih pozornost. Dakle, nakon rasprave o rezultatima usporedne analize istaknuta je važnost poučavanja kolokacija u nastavi, tj. važnost predstavljanja najčešćih kolokacija usporedno s predstavljanjem nove riječi. U skladu s navedenim, zastupa se stav da bi pri učenju jezika bilo korisno i djelotvorno odabrati kolokacijski pristup

    Essays on Technology in Presence of Globalization

    Technology has long been known to enable globalization in ways previously not thought possible, with instantaneous communication allowing members of organizations all across the globe to communicate and share information with little to no delay. However, as the effects of globalization have become more prominent, they have in turn helped to shape the very technologies that enable these processes. These three essays analyze three examples of how these two processes – globalization and technological development – impact one another. The first looks at a national policy level, attempting to understand how increased possibilities for inside leakers can force governments to consider asylum requests. The second analyzes the issue at the level of corporations, attempting to understand how and why business leaders choose to hire individuals from other countries. The third and final essay analyzes the issue at the most micro level, studying a potential application that could help analyze linguistic factors that have taken a more prominent role in a more globalized society

    The role of aspect in paraphrase operations

    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

    Un environnement générique et ouvert pour le traitement des expressions polylexicales

    Get PDF
    The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work

    Automatic translation of support verb constructions

