    Idiom treatment experiments in machine translation

    Idiomatic expressions pose a particular challenge for the today\u27;s Machine Translation systems, because their translation mostly does not result literally, but logically. The present dissertation shows, how with the help of a corpus, and morphosyntactic rules, such idiomatic expressions can be recognized and finally correctly translated. The work leads the reader in the first chapter generally to the field of Machine Translation and following that, it focuses on the special field of Example-based Machine Translation. Next, an important part of the doctoral thesis dissertation is devoted to the theory of idiomatic expressions. The practical part of the thesis describes how the hybrid Example-based Machine Translation system METIS-II, with the help of morphosyntactic rules, is able to correctly process certain idiomatic expressions and finally, to translate them. The following chapter deals with the function of the transfer system CAT2 and its handling of the idiomatic expressions. The last part of the thesis includes the evaluation of three commercial systems, namely SYSTRAN, T1 Langenscheidt, and Power Translator Pro, with respect to continuous and discontinuous idiomatic expressions. For this, both small corpora and a part of the extensive corpus Europarl and the Digital Lexicon of the German Language in 20th century were processed, firstly manually and then automatically. The dissertation concludes with results from this evaluation.Idiomatische Redewendungen stellen für heutige maschinelle Übersetzungssysteme eine besondere Herausforderung dar, da ihre Übersetzung nicht wörtlich, sondern stets sinngemäß erfolgen muss. Die vorliegende Dissertation zeigt, wie mit Hilfe eines Korpus sowie morphosyntaktischer Regeln solche idiomatische Redewendungen erkannt und am Ende richtig übersetzt werden können. Die Arbeit führt den Leser im ersten Kapitel allgemein in das Gebiet der Maschinellen Übersetzung ein und vertieft im Anschluss daran das Spezialgebiet der Beispielbasierten Maschinellen Übersetzung. Im Folgenden widmet sich ein wesentlicher Teil der Doktorarbeit der Theorie über idiomatische Redewendungen. Der praktische Teil der Arbeit beschreibt wie das hybride Beispielbasierte Maschinelle Übersetzungssystem METIS-II mit Hilfe von morphosyntaktischen Regeln befähigt wurde, bestimmte idiomatische Redewendungen korrekt zu bearbeiten und am Ende zu übersetzen. Das nachfolgende Kapitel behandelt die Funktion des Transfersystems CAT2 und dessen Umgang mit idiomatischen Wendungen. Der letzte Teil der Arbeit beinhaltet die Evaluation von drei kommerzielle Systemen, nämlich SYSTRAN, T1 Langenscheidt und Power Translator Pro, in Bezug auf deren Umgang mit kontinuierlichen und diskontinuierlichen idiomatischen Redewendungen. Hierzu wurden sowohl kleine Korpora als auch ein Teil des umfangreichen Korpus Europarl und des Digatalen Wörterbuchs der deutschen Sprache des 20. Jh. erst manuell und dann maschinell bearbeitet. Die Dissertation wird mit Folgerungen aus der Evaluation abgeschlossen

    L2 Influence on L1 : Chinese subject realisation in Chinese-English bilinguals

    This study aims to investigate the influence of the second language (L2) on the use of the first language (L1) in late bilinguals within an L1 dominant environment. Cross-linguistic influence (Kellerman & Smith, 1986) has been usually studied in the forward direction: how bilinguals’ L1 influences the acquisition and use of their L2. The other direction (i.e., the influence of L2 on L1), on the other hand, has not been sufficiently investigated. The current study looks at Chinese-speaking learners who acquire their L2 English through instruction in an L1 dominant environment. It does so by examining ‘subject realisation’, an area where Chinese and English exhibit substantial typological contrasts since Chinese allows both overt and null arguments under certain discourse-pragmatic conditions, whereas subjects in English are, under most circumstances, obligatorily expressed (Huang, 1984).. It is then hypothesized that long-time learning and regularly using English as L2 would increase the use of overt subjects realised in the bilingual’s first language, i.e., Chinese, with the consequent use of fewer null subjects in their L1. In addition, following Grosjean (1998), the interaction between the bilingual’s two languages is expected to be stronger when bilinguals produce language in the so called ‘bilingual mode’, i.e., when both languages are highly activated, than in a ‘monolingual mode’, i.e., when only one language is predominately activated. Such ‘language mode’ factor leads naturally to a futher hypothesis: fewer null subjects are realised in speech produced by Chinese-English bilinguals within a bilingual mode compared to monolingual mode

    An Unsolicited Soliloquy on Dependency Parsing

    Programa Oficial de Doutoramento en Computación . 5009V01[Abstract] This thesis presents work on dependency parsing covering two distinct lines of research. The first aims to develop efficient parsers so that they can be fast enough to parse large amounts of data while still maintaining decent accuracy. We investigate two techniques to achieve this. The first is a cognitively-inspired method and the second uses a model distillation method. The first technique proved to be utterly dismal, while the second was somewhat of a success. The second line of research presented in this thesis evaluates parsers. This is also done in two ways. We aim to evaluate what causes variation in parsing performance for different algorithms and also different treebanks. This evaluation is grounded in dependency displacements (the directed distance between a dependent and its head) and the subsequent distributions associated with algorithms and the distributions found in treebanks. This work sheds some light on the variation in performance for both different algorithms and different treebanks. And the second part of this area focuses on the utility of part-of-speech tags when used with parsing systems and questions the standard position of assuming that they might help but they certainly won’t hurt.[Resumen] Esta tesis presenta trabajo sobre análisis de dependencias que cubre dos líneas de investigación distintas. La primera tiene como objetivo desarrollar analizadores eficientes, de modo que sean suficientemente rápidos como para analizar grandes volúmenes de datos y, al mismo tiempo, sean suficientemente precisos. Investigamos dos métodos. El primero se basa en teorías cognitivas y el segundo usa una técnica de destilación. La primera técnica resultó un enorme fracaso, mientras que la segunda fue en cierto modo un ´éxito. La otra línea evalúa los analizadores sintácticos. Esto también se hace de dos maneras. Evaluamos la causa de la variación en el rendimiento de los analizadores para distintos algoritmos y corpus. Esta evaluación utiliza la diferencia entre las distribuciones del desplazamiento de arista (la distancia dirigida de las aristas) correspondientes a cada algoritmo y corpus. También evalúa la diferencia entre las distribuciones del desplazamiento de arista en los datos de entrenamiento y prueba. Este trabajo esclarece las variaciones en el rendimiento para algoritmos y corpus diferentes. La segunda parte de esta línea investiga la utilidad de las etiquetas gramaticales para los analizadores sintácticos.[Resumo] Esta tese presenta traballo sobre análise sintáctica, cubrindo dúas liñas de investigación. A primeira aspira a desenvolver analizadores eficientes, de maneira que sexan suficientemente rápidos para procesar grandes volumes de datos e á vez sexan precisos. Investigamos dous métodos. O primeiro baséase nunha teoría cognitiva, e o segundo usa unha técnica de destilación. O primeiro método foi un enorme fracaso, mentres que o segundo foi en certo modo un éxito. A outra liña avalúa os analizadores sintácticos. Esto tamén se fai de dúas maneiras. Avaliamos a causa da variación no rendemento dos analizadores para distintos algoritmos e corpus. Esta avaliaci´on usa a diferencia entre as distribucións do desprazamento de arista (a distancia dirixida das aristas) correspondentes aos algoritmos e aos corpus. Tamén avalía a diferencia entre as distribucións do desprazamento de arista nos datos de adestramento e proba. Este traballo esclarece as variacións no rendemento para algoritmos e corpus diferentes. A segunda parte desta liña investiga a utilidade das etiquetas gramaticais para os analizadores sintácticos.This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150) and from the Centro de Investigación de Galicia (CITIC) which is funded by the Xunta de Galicia and the European Union (ERDF - Galicia 2014-2020 Program) by grant ED431G 2019/01.Xunta de Galicia; ED431G 2019/0

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Using eye tracking to examine a single word copying paradigm

    Classroom learning, the bedrock of school education, relies heavily on written information transfer. The seemingly simple task of copying text from a board is psychologically complex and involves sequential visual and cognitive processes: visual encoding, constructing and maintaining mental representations, and written production. To date, most research in this area has focused on written production. This Thesis aimed to quantify what linguistic units copiers activated during visual encoding; whether similar units were used during encoding and production; and whether copiers whose reading ability was still developing, encoded and produced words in a similar fashion to copiers with fully developed reading ability. New mobile eyetracking technology enabled recording of eye-movement behaviour as an indicator of cognitive processing over both visual encoding and written production. In two experiments, both adults’ and children’s eye-movements were recorded as they made handwritten copies of single words presented on a classroom board. Gaze time measures showed both adults and children encoded whole word and syllable units, though this was not consistent for children processing long words. For all copiers, written production was often based on comparatively smaller units than encoding. Also, children needed more gaze lifts between the written copy and the board than adults, suggesting they relied more on piecemeal linguistic representations of subword units, perhaps because of forgetting. An additional lexical decision experiment showed how children could encode long words as whole word units, suggesting that piecemeal encoding of subword units might be restricted to a copying task, that includes additional task demands associated with mental representation and written production processes as well as visual encoding. Word copying relied on systematic linguistic units, but the size of a unit appeared to modulate its functionality differently for encoding and production, even for skilled readers. Findings guided development of a theoretical framework for the copying process

    Contextualising empowerment practice: negotiating the path to becoming using participatory video processes

    Participation and empowerment are major drivers of social policy, but participatory projects often happen within contested territory. This research interrogates the assumed participation-empowerment link through the example of participatory video. Fieldwork unpacks the particular approach of Real Time, an established UK project provider. Disrupting representational framing, the emergent relational processes catalysed were explored in context, to address not whether participatory video can increase participants’ influence, but how and in what circumstances. This thesis therefore builds more nuanced understanding of empowerment practice as the negotiated (rhizomic) pathway between social possibility and limitation. Following Deleuze, a becoming ontology underpinned study of project actors’ experiences of the evolving group processes that occurred. An action research design incorporated both collaborative sense-making and disruptive gaze. Analysis draws on interpersonal and observational data gathered purposively from multiple perspectives in 11 Real Time projects between 2006 and 2008. Five were youth projects and six with adults, two were women-only and one men-only, two with learning-disabled adults and four aimed at minority-ethnic participants. Participatory video as facilitated empowerment practice led to new social becoming by opening conducive social spaces, mediating interactions, catalysing group action and re-positioning participants. Videoing as performance context had a structuring and intensifying function, but there were parallel risks such as inappropriate exposure when internal and external dialogical space was confused. A rhizomic map of Real Time’s non-linear practice territory identifies eight key practice balances, and incorporates process possibilities, linked tensions, and enabling and hindering factors at four main sequential stages. Communicative action through iteratively progressing video activities unfolded through predictable transitions to generate a diversifying progression from micro to mezzo level when supported. This thesis thus shows how participatory video is constituted afresh in each new context, with the universal and particular in ongoing dynamic interchange during the emergent empowerment journey

    Darwinizing the philosophy of music education.

    Thesis (Ph.D.)-University of KwaZulu-Natal, Durban, 2011.Educational philosophy generally and the Philosophy of Music Education in particular have been slow to consider in any real depth the findings of those sciences most concerned with explaining human nature, that is, the attributes (capacities, aptitudes, predilections, appetites) we have in common because we share the same genome, much of which we also share with other species. There are several such sciences which may collectively be called Darwinian Science in that they all take as axiomatic Darwin‘s explanation for how life evolves according to the law of natural selection – a simple, mindless and purposeless algorithm that has played out for over four billion years and which continues to do so, driving not only biological evolution but, as this study argues, cultural evolution as well. Evolutionary Psychology (including Biomusicology and Evolutionary Aesthetics), Cognitive Neuroscience and Gene- Culture Coevolution Theory are the overlapping fields that this study draws from in developing an understanding of the adapted mind useful for engaging with questions germane to the Philosophy of Music Education, principally those concerning the nature and value of music and how best it should feature in general education. These are questions that have not hitherto been addressed from a Darwinian perspective. This study develops such a perspective and applies it not only to questions around music‘s educational values and possibilities, but to more encompassing philosophical questions, wherein the goals of music education are made accountable in relation both to Dewey‘s ideal of society as a function of education, and to an ecozoic vision of a sustainable planetary habitat of interdependent and interconnected life forms

    Concept Mapping Strategy For Academic Writing Tutorial In Open And Distant Learning Higher Institution

    Universitas Terbuka (UT) an open and distant higher education institution of Indonesia conducts the in-service teacher education program. In order to complete the program, the students – mostly teachers - have to submit the final academic paper. In fact, most of the UT students have difficulty to write this academic paper. UT offers an academic writing course to solve this writing program. Most of the student view academic writing still as a difficult assignment. Most of the students view academic writing as a difficult assignment to complete. UT has to find an appropriate instructional strategy that can facilitate student to write the academic writing assignment. One of the instructional strategy that can be selected to solve the academic writing problems is concept mapping. The aim of this study is to elaborate the implementation of concept map as an instructional strategy to facilitate the open and distance learning students io complete academic writing assignments. A design based research was applied to measure the effectiveness of using concept mapping strategy in helping students to gain academic writing skills. The steps of research and development model from Borg, Gall and Gall which consist of instructional design and development phases were implemented in this study. The result of this study indicated that students were facilitated and enjoyed the process of academic writing used the concept map strategy