    IQMT: A framework for automatic machine translation evaluation based on human likeness: Technical manual 2.0l

    This report presents a description and tutorial on the IQMT Framework for Machine Translation Evaluation based on `Human Likeness'. IQMT intends to offer a common workbench on which MT evaluation metrics can be robustly utilized and combined for the purpose of MT system development. Current version includes a rich set of metrics operating at different linguistic levels (lexical, shallow syntactic, syntactic, and shallow semantic).Postprint (published version

    Empirical machine translation and its evaluation

    Aquesta tesi estudia l'aplicació de les tecnologies del Processament del Llenguatge Natural disponibles actualment al problema de la Traducció Automàtica basada en Mètodes Empírics i la seva Avaluació.D'una banda, tractem el problema de l'avaluació automàtica. Hem analitzat les principals deficiències dels mètodes d'avaluació actuals, les quals es deuen, al nostre parer, als principis de qualitat superficials en els que es basen. En comptes de limitar-nos al nivell lèxic, proposem una nova direcció cap a avaluacions més heterogènies. El nostre enfocament es basa en el disseny d'un ric conjunt de mesures automàtiques destinades a capturar un ampli ventall d'aspectes de qualitat a diferents nivells lingüístics (lèxic, sintàctic i semàntic). Aquestes mesures lingüístiques han estat avaluades sobre diferents escenaris. El resultat més notable ha estat la constatació de que les mètriques basades en un coneixement lingüístic més profund (sintàctic i semàntic) produeixen avaluacions a nivell de sistema més fiables que les mètriques que es limiten a la dimensió lèxica, especialment quan els sistemes avaluats pertanyen a paradigmes de traducció diferents. Tanmateix, a nivell de frase, el comportament d'algunes d'aquestes mètriques lingüístiques empitjora lleugerament en comparació al comportament de les mètriques lèxiques. Aquest fet és principalment atribuïble als errors comesos pels processadors lingüístics. A fi i efecte de millorar l'avaluació a nivell de frase, a més de recòrrer a la similitud lèxica en absència d'anàlisi lingüística, hem estudiat la possibiliat de combinar les puntuacions atorgades per mètriques a diferents nivells lingüístics en una sola mesura de qualitat. S'han presentat dues estratègies no paramètriques de combinació de mètriques, essent el seu principal avantatge no haver d'ajustar la contribució relativa de cadascuna de les mètriques a la puntuació global. A més, el nostre treball mostra com fer servir el conjunt de mètriques heterogènies per tal d'obtenir detallats informes d'anàlisi d'errors automàticament.D'altra banda, hem estudiat el problema de la selecció lèxica en Traducció Automàtica Estadística. Amb aquesta finalitat, hem construit un sistema de Traducció Automàtica Estadística Castellà-Anglès basat en -phrases', i hem iterat en el seu cicle de desenvolupament, analitzant diferents maneres de millorar la seva qualitat mitjançant la incorporació de coneixement lingüístic. En primer lloc, hem extès el sistema a partir de la combinació de models de traducció basats en anàlisi sintàctica superficial, obtenint una millora significativa. En segon lloc, hem aplicat models de traducció discriminatius basats en tècniques d'Aprenentatge Automàtic. Aquests models permeten una millor representació del contexte de traducció en el que les -phrases' ocorren, efectivament conduint a una millor selecció lèxica. No obstant, a partir d'avaluacions automàtiques heterogènies i avaluacions manuals, hem observat que les millores en selecció lèxica no comporten necessàriament una millor estructura sintàctica o semàntica. Així doncs, la incorporació d'aquest tipus de prediccions en el marc estadístic requereix, per tant, un estudi més profund.Com a qüestió complementària, hem estudiat una de les principals crítiques en contra dels sistemes de traducció basats en mètodes empírics, la seva forta dependència del domini, i com els seus efectes negatius poden ésser mitigats combinant adequadament fonts de coneixement externes. En aquest sentit, hem adaptat amb èxit un sistema de traducció estadística Anglès-Castellà entrenat en el domini polític, al domini de definicions de diccionari.Les dues parts d'aquesta tesi estan íntimament relacionades, donat que el desenvolupament d'un sistema real de Traducció Automàtica ens ha permès viure en primer terme l'important paper dels mètodes d'avaluació en el cicle de desenvolupament dels sistemes de Traducció Automàtica.In this thesis we have exploited current Natural Language Processing technology for Empirical Machine Translation and its Evaluation.On the one side, we have studied the problem of automatic MT evaluation. We have analyzed the main deficiencies of current evaluation methods, which arise, in our opinion, from the shallow quality principles upon which they are based. Instead of relying on the lexical dimension alone, we suggest a novel path towards heterogeneous evaluations. Our approach is based on the design of a rich set of automatic metrics devoted to capture a wide variety of translation quality aspects at different linguistic levels (lexical, syntactic and semantic). Linguistic metrics have been evaluated over different scenarios. The most notable finding is that metrics based on deeper linguistic information (syntactic/semantic) are able to produce more reliable system rankings than metrics which limit their scope to the lexical dimension, specially when the systems under evaluation are different in nature. However, at the sentence level, some of these metrics suffer a significant decrease, which is mainly attributable to parsing errors. In order to improve sentence-level evaluation, apart from backing off to lexical similarity in the absence of parsing, we have also studied the possibility of combining the scores conferred by metrics at different linguistic levels into a single measure of quality. Two valid non-parametric strategies for metric combination have been presented. These offer the important advantage of not having to adjust the relative contribution of each metric to the overall score. As a complementary issue, we show how to use the heterogeneous set of metrics to obtain automatic and detailed linguistic error analysis reports.On the other side, we have studied the problem of lexical selection in Statistical Machine Translation. For that purpose, we have constructed a Spanish-to-English baseline phrase-based Statistical Machine Translation system and iterated across its development cycle, analyzing how to ameliorate its performance through the incorporation of linguistic knowledge. First, we have extended the system by combining shallow-syntactic translation models based on linguistic data views. A significant improvement is reported. This system is further enhanced using dedicated discriminative phrase translation models. These models allow for a better representation of the translation context in which phrases occur, effectively yielding an improved lexical choice. However, based on the proposed heterogeneous evaluation methods and manual evaluations conducted, we have found that improvements in lexical selection do not necessarily imply an improved overall syntactic or semantic structure. The incorporation of dedicated predictions into the statistical framework requires, therefore, further study.As a side question, we have studied one of the main criticisms against empirical MT systems, i.e., their strong domain dependence, and how its negative effects may be mitigated by properly combining outer knowledge sources when porting a system into a new domain. We have successfully ported an English-to-Spanish phrase-based Statistical Machine Translation system trained on the political domain to the domain of dictionary definitions.The two parts of this thesis are tightly connected, since the hands-on development of an actual MT system has allowed us to experience in first person the role of the evaluation methodology in the development cycle of MT systems

    A Proposed Methodology for Subjective Evaluation of Video and Text Summarization

    International audienceTo evaluate a system that automatically summarizes video files (image and audio), it should be taken into account how the system works and which are the part of the process that should be evaluated, as two main topics to be evaluated can be differentiated: the video summary and the text summary. So, in the present article it is presented a complete way in order to evaluate this type of systems efficiently. With this objective, the authors have performed two types of evaluation: objective and subjective (the main focus of this paper). The objective evaluation is mainly done automatically, using established and proven metrics or frameworks, but it may need in some way the participation of humans, while the subjective evaluation is based directly on the opinion of people, who evaluate the system by answering a set of questions, which are then processed in order to obtain the targeted conclusions. The obtained general results from both evaluation systems will provide valuable information about the completeness and coherence, as well as the correctness of the generated summarizations from different points of view, as the lexical, semantical, etc. perspective. Apart from providing information about the state of the art, it will be presented an experimental proposal too, including the parameters of the experiment and the evaluation methods to be applied

    An autoethnographic account of a piano teacher's professional growth in the piano lab: Improving beginner pianists' musicianship by teaching them to play by ear

    This study discusses the researcher’s professional growth and the challenges she faced when deviating from teaching how she had been taught. Difficulties and successes arose as she attempted to implement strategies of playing by ear, rather than continuing an exclusive emphasis on reading notation in instrumental teaching. Playing by ear has been identified extensively in recent music education scholarship as important for supporting young children’s musicianship, aural development, motivation, and engagement (e.g., Baker & Green, 2013; McPherson, 1993). This thesis examines the researcher’s development as a piano teacher, charting the adoption of this strategy through an autoethnographic research approach. It does so in relation to cycles of action research she implemented with groups of beginner pianists in a piano lab at a music school in Ireland. It also examines the impacts for beginner pianists. Four groups of five children aged 6–10 years participated from September 2015 to May 2018. Data were collected through focus group interviews with pupils and parents, videos of the teaching, and teacher-reflective field notes. Key findings of the autoethnographic work relate to how life events, childhood musical experiences, pedagogical training, and teaching career shaped the perspectives she brought to bear in her changing practice. Parents also became involved and musically educated. They contributed to the research while supporting their children’s progress. The research illustrates how group dynamics, parental involvement, musicianship and differentiation shift and practice changes in response to teachers having to negotiate situations in the piano lab. Playing by ear contributed positively over time to these youngsters’ musicianship, which might prove useful in later life for retaining their musical enjoyment. Whilst the sample was limited, these findings contribute to an improved understanding of how beginner pianists might be supported more effectively in their earlier years of music studies

    An Investigation into Automatic Translation of Prepositions in IT Technical Documentation from English to Chinese

    Machine Translation (MT) technology has been widely used in the localisation industry to boost the productivity of professional translators. However, due to the high quality of translation expected, the translation performance of an MT system in isolation is less than satisfactory due to various generated errors. This study focuses on translation of prepositions from English into Chinese within technical documents in an industrial localisation context. The aim of the study is to reveal the salient errors in the translation of prepositions and to explore possible methods to remedy these errors. This study proposes three new approaches to improve the translation of prepositions. All approaches attempt to make use of the strengths of the two most popular MT architectures at the moment: Rule-Based MT (RBMT) and Statistical MT (SMT). The approaches include: firstly building an automatic preposition dictionary for the RBMT system; secondly exploring and modifing the process of Statistical Post-Editing (SPE) and thirdly pre-processing the source texts to better suit the RBMT system. Overall evaluation results (both human evaluation and automatic evaluation) show the potential of our new approaches in improving the translation of prepositions. In addition, the current study also reveals a new function of automatic metrics in assisting researchers to obtain more valid or purpose-specific human valuation results

    Investigating translation teaching methods through classroom interaction analysis : a case-study of Arabic-English teaching situation

    The purpose of this study is to investigate translation teaching methods as practised in the classroom. Its content falls into two parts. Part One is a review of literature on translation teaching in general in which the main issues, such as the formal academic training of translators, are identified and the curriculum content described (Chapter 1). This is followed by a review of the theoretical aspects of translation teaching methods and their relation between language studies and translation theory (Chapter 2), the main purpose of which is to gain an overall understanding of the mechanism of translation and its techniques so as to facilitate the execution of the research.Part Two is the design of the research and its execution. The research is data-based. The data are tape-recorded translation lessons collected from 3 different classes in three different universities. A background to the procedure adopted for data-collection, the subjects who participated in this study, and the Sinclair system of classroom interaction analysis which was applied to the data are described in Chapter 3. This is followed by the application of Sinclair's system to the data on the basis of which a coding system was set up (Chapter 4). The data analysis revealed the existence of three different translationmethods; namely the grammatical, the text-linguistic and theinterpretive. The characteristics of each method are described and their implications analysed (Chapter 5). The thesis ends with a critical assessment of translation teaching in general and translation teaching methods in particular and proposes guidelines for an experiment for a unified teaching method

    A discourse perspective on figurative expression in literary works with reference to English/Arabic translation

    This dissertation is intended to fulfil two main objectives, firstly, to examine the function of figures of speech or figurative expression from a discourse point of view, and secondly, to assess whether English/Arabic or Arabic/English translators take into consideration this discourse aspect, and if they do, to what extent. The division of figures of speech is based on Arabic (Barigha) Rhetoric. The dissertation develops along the following lines. It examines the "anatomy" of each individual figure of speech with the aim of establishing their respective merits. It afterwards highlights their collective, social function in a wider sense. The research narrows down their social role concentrating on one main role: creating a bond of intimacy between the speaker and the audience. It further examines the mechanism on which intimacy is based, i.e. politeness. Politeness is a strategy adopted and exacted by a rational speaker on a rational audience and enables him to get them persuaded. It is concluded that each figure of speech presents the speaker with an ideal tool for addressing a particular audience. It follows, therefore, that the having recourse to a particular figure of speech is a stance or an attitude by the speaker towards his audience. Meanwhile, figures of speech collectively present the speaker with a tool which enables him to express a mobility of (discoursal) tones and attitudes. The dissertation develops the theme of attitude through "critical" discourse. Critical discourse fleshes the attitude of the speaker by denaturalizing the orderliness of talk and by providing social accounts which are intended to probe the social roots of language. Critical discourse also accounts for why things happen the way they do, by whom and the motive for their doing. It, therefore, establishes a link between verbal interaction and three social phenomena which determine and are determined by verbal interaction. These factors are: action, institution and higher social formation. Action is at the social base and is presupposed by social structure, institution is the loci of power and provides its subjects with motive and with a frame of work to act within, while higher social formation stands for a series of elements and their interrelations which conjointly define the persistence of a social formation and distinguish one society from another. The study develops an integrated model of critical discourse for the analysis of figurative expression. The model is composed of three components: (i) syntax, (ii) an interpretative guideline, and (iii) an explanatory framework. Finally, figurative expression is examined and a translation assessment based on an empirical approach is made. The dissertation examines figures of speech in literary works where they abound. Nevertheless its findings can be applicable to other discourse types. This is because it deals with figurative expression as a transaction that is negotiated between the two parties to the verbal interaction. The implication of the critical approach towards the study of discourse for the translator-trainee is two-fold. First, he should make a thorough linguistic analysis of figurative expression before he embarks on translating, and second, he should consider language as a social practice that has its roots in the society from which it emanates. He, therefore, has to try to account for all factors that might have a bearing on the meaning of the text he is going to handle. The findings deduced from this study are summed up as follows. First, figures of speech are functional in that they specifically help discourse to emerge and help to distinguish one discourse type from another. Second, figures of speech form an ensemble of thought which can express a body of (discoursal) attitudes and tones. Third, the dissertation corroborates that negligence or unawareness of the discourse aspect weakens the effect of figures of speech and sometimes distorts the meaning. Four, the present studies by both theorists and experimenters of figurative expression are not sensitive enough to its discourse function, nor are the translators of the two novels which form the data for this study.University of Jorda