    The article presents an analysis of masculine and feminine nouns denoting professions in English and Slovenian. The researched expressions are first discussed from the point of view of word-formation (derivation and compounding), then corpora are employed to examine the frequency of the forms in both languages. The corpus data are complemented with an analysis of collocators with the purpose of identifying the semantic preferences and associative meanings of gender-marked expressions for professions. The results reveal that some feminine nouns for professions are stylistically unmarked (especially in Slovenian), whereas others co- occur (in English and Slovenian) with words that reflect the societal attitudes to gender roles, appearance and character.predstavi kontrastivno analizo moških in ženskih samostalnikov za poklice na jezikovnem paru angleščina-slovenščina. Obravnavani izrazi so najprej predstavljeni z besedotvornega vidika, prek izpeljave oziroma zlaganja. Prispevek nato s pomočjo korpusov preverja pogostnost opazovanih oblik v obeh jezikih. Zbrane korpusne podatke razčleni tudi s pregledom njihovih kolokatorjev, s čimer opredeli semantične preference in pomenske asociacije obravnavanih feminativov in maskulinativov. Rezultati pokažejo, da so feminativi za poklice lahko nezaznamovani (sploh v slovenščini) ali pa se pojavljajo z izrazi, ki (v angleščini in v slovenščini) odsevajo družbeni odnos do spolnih vlog, videza in osebnostnih lastnosti

    Methodology for the Corpus-based English-German-Ukrainian Dictionary of Collocations

    Traballo Fin de Máster en Lexicografía. Curso 2021-2022[EN]This Master’s thesis recounts the vision of the multilingual collocations dictionary project for the English, German, and Ukrainian languages (“Corpus-based English-German-Ukrainian Dictionary of Collocations” or EDU-Col) and elaborates on the methodology for compiling the dictionary and its key dictionary structures. The dictionary will cater to the needs of language learners, translators, text producers (journalists, copywriters), and native speakers. Tapping into the latest developments in NLP and the capabilities of corpora, the methodology for creating the proposed dictionary relies on the automatic extraction of dictionary information types, namely collocation candidates, example sentences, and translation equivalents for collocations. The automatic extraction is followed by manual validation in order to maintain the quality of the obtained lexicographic data.[DE]Diese Masterarbeit befasst sich mit der Konzeption des mehrsprachigen Kollokationswörterbuchs für die englische, deutsche und ukrainische Sprache ("Corpus-based English-German-Ukrainian Dictionary of Collocations" oder EDU-Col) und erläutert die Methodik für die Erstellung des Wörterbuchs und seine wichtigsten Wörterbuchstrukturen. Das Wörterbuch ist auf die Bedürfnisse von Sprachlernern, Übersetzern, Redakteuren (Journalisten, Werbetextern) und Muttersprachler ausgerichtet. Die Methodik zur Erstellung des vorgeschlagenen Wörterbuchs basiert auf der automatischen Extraktion von Wörterbuchinformationen, nämlich Kollokationskandidaten, Beispielsätzen und Übersetzungsäquivalenten für Kollokationen. Auf die automatische Extraktion folgt eine manuelle Überprüfung, um die Qualität der erhaltenen lexikografischen Daten zu gewährleiste

    A translation robot for each translator? : a comparative study of manual translation and post-editing of machine translations: process, quality and translator attitude

    To keep up with the growing need for translation in today's globalised society, post-editing of machine translation is increasingly being used as an alternative to regular human translation. While presumably faster than human translation, it is still unsure whether the quality of a post-edited text is comparable to the quality of a human translation, especially for general text types. In addition, there is a lack of understanding of the post-editing process, the effort involved, and the attitude of translators towards it. This dissertation contains a comparative analysis of post-editing and human translation by students and professional translators for general text types from English into Dutch. We study process, product, and translators' attitude in detail. We first conducted two pretests with student translators to try possible experimental setups and to develop a translation quality assessment approach suitable for a fine-grained comparative analysis of machine-translated texts, post-edited texts, and human translations. For the main experiment, we examined students and professional translators, using a combination of keystroke logging tools, eye tracking, and surveys. We used both qualitative analyses and advanced statistical analyses (mixed effects models), allowing for a multifaceted analysis. For the process analysis, we looked at translation speed, cognitive processing by means of eye fixations, the usage of external resources and its impact on overall time. For the product analysis, we looked at overall quality, frequent error types, and the impact of using external resources on quality. The attitude analysis contained questions about perceived usefulness, perceived speed, perceived quality of machine translation and post-editing, and the translation method that was perceived as least tiring. One survey was conducted before the experiment, the other after, so we could detect changes in attitude after participation. In two more detailed analyses, we studied the impact of machine translation quality on various types of post-editing effort indicators, and on the post-editing of multi-word units. We found that post-editing is faster than human translation, and that both translation methods lead to products of comparable overall quality. The more detailed error analysis showed that post-editing leads to somewhat better results regarding adequacy, and human translation leads to better results regarding acceptability. The most common errors for both translation methods are meaning shifts, logical problems, and wrong collocations. Fixation data indicated that post-editing was cognitively less demanding than human translation, and that more attention was devoted to the target text than to the source text. We found that fewer resources are consulted during post-editing than during human translation, although the overall time spent in external resources was comparable. The most frequently used external resources were Google Search, concordancers, and dictionaries. Spending more time in external resources, however, did not lead to an increase in quality. Translators indicated that they found machine translation useful, but they preferred human translation and found it more rewarding. Perceptions about speed and quality were mixed. Most participants believed post-editing to be at least as fast and as good as human translation, but barely ever better. We further discovered that different types of post-editing effort indicators were impacted by different types of machine translation errors, with coherence issues, meaning shifts, and grammatical and structural issues having the greatest effect. HTER, though commonly used, does not correlate well with more process-oriented post-editing effort indicators. Regarding the post-editing of multi-word units, we suggest 'contrast with the target language' as a useful new way of classifying multi-word units, as contrastive multi-word units were much harder to post-edit. In addition, we noticed that research strategies for post-editing multi-word units lack efficiency. Consulting external resources did lead to an increased quality of post-edited multi-word units, but a lot of time was spent in external resources when this was not necessary. Interestingly, the differences between human translation and post-editing usually outweighed the differences between students and professionals. Students did cognitively process texts differently, having longer fixation durations on the source text during human translation, and more fixations on the target text during post-editing, whereas professional translators' fixation behaviour remained constant. For the usage of external resources, only the time spent in dictionaries was higher for students than for professional translators, the usage of other resources was comparable. Overall quality was comparable for students and professionals, but professionals made fewer adequacy errors. Deletions were more noticeable for students than for professional translators in both methods of translation, and word sense issues were more noticeable for professional translators than for students when translating from scratch. Surprisingly, professional translators were often more positive about post-editing than students, believing they could produce products of comparable quality with both methods of translation. Students in particular struggled with the cognitive processing of meaning shifts, and they spent more time in pauses than professional translators. Some of the key contributions of this dissertation to the field of translation studies are the fact that we compared students and professional translators, developed a fine-grained translation quality assessment approach, and used a combination of state-of-the-art logging tools and advanced statistical methods. The effects of experience in our study were limited, and we suggest looking at specialisation and translator confidence in future work. Our guidelines for translation quality assessment can be found in the appendix, and contain practical instructions for use with brat, an open-source annotation tool. The experiment described in this dissertation is also the first to integrate Inputlog and CASMACAT, making it possible to include information on external resources in the CASMACAT logging files, which can be added to the CRITT Translation Process Research Database. Moving beyond the methodological contributions, our findings can be integrated in translation teaching, machine translation system development, and translation tool development. Translators need hands-on post-editing experience to get acquainted with common machine translation errors, and students in particular need to be taught successful strategies to spot and solve adequacy issues. Post-editors would greatly benefit from machine translation systems that made fewer coherence errors, meaning shift errors, and grammatical and structural errors. If visual clues are included in a translation tool (e.g., potentially problematic passages or polysemous words), these should be added to the target text. Tools could further benefit from integration with commonly used external resources, such as dictionaries. In the future, we wish to study the translation and post-editing process in even more detail, taking pause behaviour and regressions into account, as well as look at the passages participants perceived as the most difficult to translate and post-edit. We further wish to gain an even better understanding of the usage of external resources, by looking at the types of queries and by linking queries back to source and target text words. While our findings are limited to the post-editing and human translation of general text types from English into Dutch, we believe our methodology can be applied to different settings, with different language pairs. It is only by studying both processes in many different situations and by comparing findings that we will be able to develop tools and create courses that better suit translators' needs. This, in turn, will make for better, and happier, future generations of translators

    The knowing ear : an Australian test of universal claims about the semantic structure of sensory verbs and their extension into the domain of cognition

    In this paper we test previous claims concerning the universality of patterns of polysemy and semantic change in perception verbs. Implicit in such claims are two elements: firstly, that the sharing of two related senses A and B by a given form is cross-linguistically widespread, and matched by a complementary lack of some rival polysemy, and secondly that the explanation for the ubiquity of a given pattern of polysemy is ultimately rooted in our shared human cognitive make-up. However, in comparison to the vigorous testing of claimed universals that has occurred in phonology, syntax and even basic lexical meaning, there has been little attempt to test proposed universals of semantic extension against a detailed areal study of non-European languages. To address this problem we examine a broad range of Australian languages to evaluate two hypothesized universals: one by Viberg (1984), concerning patterns of semantic extension across sensory modalities within the domain of perception verbs (i .e. intra-field extensions), and the other by Sweetser (1990), concerning the mapping of perception to cognition (i.e. trans-field extensions). Testing against the Australian data allows one claimed universal to survive, but demolishes the other, even though both assign primacy to vision among the senses

    Lexical development in language acquistion and learning

    Rozdział poświęcony jest zagadnieniom związanym z przyswajaniem słownictwa obcojęzycznego. Omawia różne znaczenia takich pojęć, jak słownik (ang. lexicon), słownik wewnętrzny (ang. mental lexicon) i słowo (ang. word). Analizuje odpowiedź na pytanie „Co to znaczy znać dane słowo?” w odniesieniu do takich aspektów wiedzy leksykalnej, jak wymowa, pisownia, świadomość budowy wyrazu, łączenie formy słowa z jego znaczeniem, skojarzenia i związki między wyrazami, funkcje gramatyczne, kolokacje, ograniczenia co do użycia słowa. Zagadnienie słownika wewnętrznego przedstawione jest w kontekście wybranych modeli przetwarzania słownictwa. Podniesiono również kwestię integracji i rozdziału wewnętrznych słowników dwujęzycznych. Omówiono etapy przyswajania słownictwa języka pierwszego oraz zagadnienia dotyczące przyswajania słownictwa języka drugiego w nawiązaniu do wspomnianych wcześniej aspektów wiedzy leksykalnej

    Multiword expressions at length and in depth

    The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

    An outline of English lexicology

    xi, 212 hlm ,26,2 c

    El tratamiento y la representación de las colocaciones verbales en el lenguaje especializado del turismo de aventura

    A collocation is considered a frequent co-occurrence of two words which hold a syntactic relationship and whose elements enjoy a different status. Given their perception as a unit in language, access to the prominent word (base) involves immediate access to the other item (collocate). In terms of meaning, some combinations tend to be more transparent than others. The pervasiveness of these word associations in language has sparked a strong research interest in the last decades. A compelling reason for this approach may be the fact that they are naturally produced by native speakers but must be actively learned by non-native individuals. Not only has this reality led to their treatment in the general language, but it has also become a legitimate field of study in a wide range of specialized languages, such as the environment, computing, law or tourism, which is our object of study. As a consequence, specialized knowledge resources covering this type of word combinations have seen the light with the primary purpose of offering some extra help to people who deal with this type of language, for example, translators, linguists or other professionals. Nevertheless, there is still much to do in this respect. Taken this into account, it is hypothesized that verb collocations in the specialized language of adventure tourism convey specialized meaning that is worth being collected in terminological products. Therefore, this work endeavors, as its main purpose, to perform a deep analysis of verb collocations in this specialized domain and their implementation in the entries for motion verbs in DicoAdventure, a specialized dictionary of adventure tourism, whose inspirational idea was to highlight the significant role of verbs in the linguistic expression of concepts. Accordingly, the following theoretical objectives were set: first, to cover the linguistic branches which influence specialized lexicography; second, to define the concept of specialized collocation; and third, to examine a vast number of lexicographical and terminological resources so as to discover the items of information that would make an adequate representation of collocations in a specialized dictionary and, then, design a model for such task. Furthermore, the following practical objectives were formulated: first, to extract the motion verbs which would be the bases of the collocations implemented; second, to retrieve the lexical collocations of these verbs; and third, to classify the resulting list of collocations according to the meaning expressed, that is, actual motion or fictive (or metaphorical) motion. The practical steps taken in this research were based on the English monolingual specialized corpus ADVENCOR, which contains promotional texts about adventure tourism, and the use of corpus management software. The results of the theoretical work can be summarized as follows: (1) the specialized language of adventure tourism must be considered as specialized as any others; (2) collocations are not usually encoded in verb entries in dictionaries; and (3) a specialized collocation carries specialized knowledge which must be covered in terminological products. On the other hand, regarding the practical work, 12% of the verbs extracted were selected, as they were the ones expressing motion. However, only 46.61% of them produced collocations according to the extraction criteria established. Last, after applying more strict criteria for the collocation classification, only 25.42% of the verbs along with their collocations were collected in the dictionary. In addition to these results, the theory of Frame Semantics proved useful to understand the meaning of the verbs and their collocates. As for their implementation, which was the primary objective of this doctoral dissertation, the inclusion of verb collocations was of paramount importance for the identification of distinct meanings expressed by one verb in different contexts, as collocates conveyed subtle nuances of meaning. Finally, it was concluded that the incorporation of explanations about the combinations in lay terms facilitates the comprehension of the entries to any type of user, from experts to laypersons, which makes DicoAdventure a terminological product that can render valuable assistance to individuals with distinct specialized expertise.Una colocación es una coaparición frecuente de dos palabras que mantienen una relación sintáctica y cuyos elementos alcanzan un estatus diferente. Puesto que se perciben como una unidad del lenguaje, el acceso al elemento prominente (base) conlleva el acceso inmediato al otro componente (colocativo). Con respecto a su significado, algunas combinaciones tienden a ser más transparentes que otras. La constante presencia de las colocaciones en el lenguaje ha despertado gran interés por su investigación en las últimas décadas. Una razón convincente de este acercamiento podría ser el hecho de que los hablantes nativos las producen de forma natural, mientras que los no nativos deben aprenderlas de manera activa. Esta realidad no solo ha llevado a su tratamiento en el lenguaje general, sino también a que se hayan convertido en un campo de estudio legítimo en una amplia gama de lenguajes especializados, como son el medio ambiente, la informática, el derecho o el turismo, que es el objeto de estudio de esta investigación. Como consecuencia, se han creado recursos de conocimiento especializado con el propósito fundamental de ofrecer ayuda a las personas que interactúan con este tipo de lenguaje, por ejemplo, traductores, lingüistas u otro tipo de profesionales. No obstante, aún queda mucho por hacer en este aspecto. Teniendo esto en cuenta, la hipótesis de este trabajo se basa en la idea de que las colocaciones verbales en el lenguaje especializado del turismo de aventura expresan significados especializados que merecen ser recopilados en productos terminológicos. Por lo tanto, este trabajo tiene como principal objetivo el estudio exhaustivo de las colocaciones verbales en este campo de especialidad y su implementación en las entradas de los verbos de movimiento en DicoAdventure, un diccionario especializado del turismo de aventura, cuyo punto de partida fue la intención de destacar el importante papel que juegan los verbos en la expresión lingüística de los conceptos. Por consiguiente, se establecieron los siguientes objetivos teóricos: primero, revisar las ramas de la lingüística que ejercen una influencia en la lexicografía especializada; segundo, definir el concepto de colocación especializada; y tercero, examinar un gran número de recursos lexicográficos y terminológicos para descubrir qué tipo de información conformaría una representación adecuada de colocaciones en un diccionario especializado y, a continuación, diseñar un modelo para esta tarea. Además, se propusieron estos objetivos prácticos: primero, extraer los verbos de movimiento que serían las bases de las colocaciones implementadas; segundo, extraer las colocaciones léxicas de estos verbos; y tercero; clasificar la lista resultante de colocaciones según su significado, es decir, movimiento real o movimiento figurado (o metafórico). Los pasos prácticos que se dieron en esta investigación se llevaron a cabo mediante la gestión del corpus especializado monolingüe en inglés ADVENCOR, que contiene textos promocionales sobre el turismo de aventura, y el uso de software de gestión de corpus. Los resultados de la parte teórica del trabajo se pueden resumir de la siguiente manera: (1) el lenguaje especializado del turismo de aventura debe considerarse tan especializado como otros; (2) las colocaciones no suelen codificarse en las entradas de verbos en los diccionarios; y (3) una colocación especializada contiene conocimiento especializado que debe aparecer en productos terminológicos. Por otro lado, con respecto al trabajo práctico, se seleccionó el 12% de los verbos extraídos, ya que eran los que expresaban movimiento. Sin embargo, solo el 46,61% de ellos produjeron colocaciones según los criterios de extracción establecidos. Por último, después de aplicar criterios más estrictos para la clasificación de las colocaciones, solo el 25,42% de los verbos con sus colocaciones fueron recogidos en el diccionario. Además de estos resultados, se demostró la utilidad de la teoría de la Semántica de Marcos para entender el significado de los verbos y sus colocativos. En cuanto a su implementación, que era el objetivo principal de esta tesis doctoral, la inclusión de colocaciones verbales fue de suma importancia para la identificación de los distintos significados expresados por un verbo en diferentes contextos, puesto que los colocativos aportaban sutiles matices de significado. Finalmente, se concluyó que la incorporación de explicaciones sobre las combinaciones en términos legos favorece la comprensión de las entradas por parte de cualquier tipo de usuario, desde expertos a personas no especialistas, lo cual hace de DicoAdventure un producto terminológico que puede proporcionar valiosa ayuda a personas con diversa formación especializada