369 research outputs found

    Comparing the production of a formula with the development of L2 competence

    Get PDF
    This pilot study investigates the production of a formula with the development of L2 competence over proficiency levels of a spoken learner corpus. The results show that the formula in beginner production data is likely being recalled holistically from learners’ phonological memory rather than generated online, identifiable by virtue of its fluent production in absence of any other surface structure evidence of the formula’s syntactic properties. As learners’ L2 competence increases, the formula becomes sensitive to modifications which show structural conformity at each proficiency level. The transparency between the formula’s modification and learners’ corresponding L2 surface structure realisations suggest that it is the independent development of L2 competence which integrates the formula into compositional language, and ultimately drives the SLA process forward

    Mind the source data! : Translation equivalents and translation stimuli from parallel corpora

    Get PDF
    Statements like ‘Word X of language A is translated with word Y of language B’ are incorrect, although they are quite common: words cannot be translated, as translation takes place on the level of sentences or higher. A better term for the correspondence between lexical items of source texts and their matches in target texts would be translation equivalence (Teq). In addition to Teq, there exists a reverse relation—translation stimulation (Tst), which is a correspondence between the lexical items of target texts and their matches (=stimuli) in source texts. Translation equivalents and translation stimuli must be studied separately and based on natural direct translations. It is not advisable to use pseudo-parallel texts, i.e. aligned pairs of translations from a ‘hub’ language, because such data do not reflect real translation processes. Both Teq and Tst are lexical functions, and they are not applicable to function words like prepositions, conjunctions, or particles, although it is technically possible to find Teq and Tst candidates for such words as well. The process of choosing function words when translating does not proceed in the same way as choosing lexical units: first, a relevant construction is chosen, and next, it is filled with relevant function words. In this chapter, the difference between Teq and Tst will be shown in examples from Russian–Finnish and Finnish–Russian parallel corpora. The use of Teq and Tst for translation studies and contrastive semantic research will be discussed, along with the importance of paying attention to the nature of the texts when analysing corpus findings.acceptedVersionPeer reviewe

    実応用を志向した機械翻訳システムの設計と評価

    Get PDF
    Tohoku University博士(情報科学)thesi

    What do rendering options tell us about the translating mind? Testing the choice network analysis hypothesis

    Get PDF
    Frame. Assessing the difficulty of source texts and parts thereof is important in CTIS, whether for research comparability, for didactic purposes or setting price differences in the market. In order to empirically measure it, Campbell & Hale (1999) and Campbell (2000) developed the Choice Network Analysis (CNA) framework. Basically, the CNA’s main hypothesis is that the more translation options (a group of) translators have to render a given source text stretch, the higher the difficulty of that text stretch will be. We will call this the CNA hypothesis. In a nutshell, this research project puts the CNA hypothesis to the test and studies whether it does actually measure difficulty. Data collection. Two groups of participants (n=29) of different profiles and from two universities in different countries had three translation tasks keylogged with Inputlog, and filled pre- and post-translation questionnaires. Participants translated from English (L2) into their L1s (Spanish or Italian), and worked—first in class and then at home—using their own computers, on texts ca. 800–1000 words long. Each text was translated in approximately equal halves in two 1-hour sessions, in three consecutive weeks. Only the parts translated at home were considered in the study. Results. A very different picture emerged from data than that which the CNA hypothesis might predict: there was no prevalence of disfluent task segments when there were many translation options, nor was a prevalence of fluent task segments associated to fewer translation options. Indeed, there was no correlation between the number of translation options (many and few) and behavioral fluency. Additionally, there was no correlation between pauses and both behavioral fluency and typing speed. The discussed theoretical flaws and the empirical evidence lead to the conclusion that the CNA framework does not and cannot measure text and translation difficulty.Stato dell'arte. La valutazione della difficoltà dei testi di partenza e di parti di essi ricopre un ruolo centrale nel campo degli studi cognitivi sulla traduzione e l'interpretazione (CTIS). Per misurarla a livello empirico, Campbell & Hale (1999) e Campbell (2000) hanno sviluppato la Choice Network Analysis (CNA). L'ipotesi principale della CNA è che quante più opzioni di traduzione un gruppo di traduttori ha per tradurre una porzione di testo, più alta sarà la sua difficoltà. Questo progetto di ricerca mette alla prova l'ipotesi della CNA per verificarne la validità come strumento per misurare la difficoltà. Raccolta dei dati. Due gruppi di partecipanti (n=29) di profili diversi e provenienti da due università di paesi diversi hanno svolto tre prove di traduzione usando Inputlog, ognuna preceduta e seguita da un questionario. I partecipanti hanno tradotto dall'inglese (L2) alla loro L1 (spagnolo o italiano) e hanno lavorato prima in classe e poi a casa con i propri computer su testi di circa 800-1000 parole. Ogni testo è stato suddiviso in metà pressoché uguali e tradotto in due sessioni da un'ora l'una, in tre settimane consecutive. Risultati. Dai dati è emerso un quadro molto diverso da quello suggerito dall'ipotesi della CNA: non è stata riscontrata alcuna prevalenza di segmenti con minore fluidità relativi a un maggior numero di opzioni di traduzione, né una prevalenza di segmenti con maggiore fluidità associati a un minor numero di opzioni di traduzione. Al contrario, in entrambi i casi la fluidità dei segmenti è rimasta tendenzialmente nella media. Infine, non è stata riscontrata alcuna correlazione tra le pause e fluidità comportamentale o la velocità di batttitura. Le inesattezze teoriche precedentemente discusse e le prove empiriche portano alla conclusione che la CNA non misura e non può misurare la difficoltà del testo e della traduzione

    Dependency-based Bilingual Word Embeddings and Neural Machine Translation

    Get PDF
    Bilingual word embeddings, which represent lexicons from various languages in a common embedding space, are critical for facilitating semantic and knowledge trans- fers in a wide range of cross-lingual NLP applications. The significance of learning bilingual word embedding representations in many Natural Language Processing (NLP) tasks motivates us to investigate the effect of many factors, including syntac- tical information, on the learning process for different languages with varying levels of structural complexity. By analysing the components that influence the learning process of bilingual word embeddings (BWEs), this thesis examines some factors for learning bilingual word embeddings effectively. Our findings in this thesis demon- strate that increasing the embedding size for language pairs has a positive impact on the learning process for BWEs. While sentence length depends on the language. Short sentences perform better than long ones in the En-ES experiment. However, by increasing the sentence, En-Ar and En-De experiment achieve improved model accuracy. Arabic segmentation, according to En-Ar experiments, is essential to the learning process for BWEs and can boost model accuracy by up to 10%. Incorporating dependency features into the learning process enhances the trained models performance and results in more improved BWEs in all language pairs. Finally, we investigated how the dependancy-based pretrained BWEs affected the neural machine translation (NMT) model. The findings indicate that in various MT evaluation matrices, the trained dependancy-based NMT models outperform the baseline NMT model

    Pre-Trained Language-Meaning Models for Multilingual Parsing and Generation

    Full text link
    Pre-trained language models (PLMs) have achieved great success in NLP and have recently been used for tasks in computational semantics. However, these tasks do not fully benefit from PLMs since meaning representations are not explicitly included in the pre-training stage. We introduce multilingual pre-trained language-meaning models based on Discourse Representation Structures (DRSs), including meaning representations besides natural language texts in the same model, and design a new strategy to reduce the gap between the pre-training and fine-tuning objectives. Since DRSs are language neutral, cross-lingual transfer learning is adopted to further improve the performance of non-English tasks. Automatic evaluation results show that our approach achieves the best performance on both the multilingual DRS parsing and DRS-to-text generation tasks. Correlation analysis between automatic metrics and human judgements on the generation task further validates the effectiveness of our model. Human inspection reveals that out-of-vocabulary tokens are the main cause of erroneous results.Comment: Accepted by ACL2023 finding

    A distributional investigation of German verbs

    Get PDF
    Diese Dissertation bietet eine empirische Untersuchung deutscher Verben auf der Grundlage statistischer Beschreibungen, die aus einem großen deutschen Textkorpus gewonnen wurden. In einem kurzen Überblick über linguistische Theorien zur lexikalischen Semantik von Verben skizziere ich die Idee, dass die Verbbedeutung wesentlich von seiner Argumentstruktur (der Anzahl und Art der Argumente, die zusammen mit dem Verb auftreten) und seiner Aspektstruktur (Eigenschaften, die den zeitlichen Ablauf des vom Verb denotierten Ereignisses bestimmen) abhängt. Anschließend erstelle ich statistische Beschreibungen von Verben, die auf diesen beiden unterschiedlichen Bedeutungsfacetten basieren. Insbesondere untersuche ich verbale Subkategorisierung, Selektionspräferenzen und Aspekt. Alle diese Modellierungsstrategien werden anhand einer gemeinsamen Aufgabe, der Verbklassifikation, bewertet. Ich zeige, dass im Rahmen von maschinellem Lernen erworbene Merkmale, die verbale lexikalische Aspekte erfassen, für eine Anwendung von Vorteil sind, die Argumentstrukturen betrifft, nämlich semantische Rollenkennzeichnung. Darüber hinaus zeige ich, dass Merkmale, die die verbale Argumentstruktur erfassen, bei der Aufgabe, ein Verb nach seiner Aspektklasse zu klassifizieren, gut funktionieren. Diese Ergebnisse bestätigen, dass diese beiden Facetten der Verbbedeutung auf grundsätzliche Weise zusammenhängen.This dissertation provides an empirical investigation of German verbs conducted on the basis of statistical descriptions acquired from a large corpus of German text. In a brief overview of the linguistic theory pertaining to the lexical semantics of verbs, I outline the idea that verb meaning is composed of argument structure (the number and types of arguments that co-occur with a verb) and aspectual structure (properties describing the temporal progression of an event referenced by the verb). I then produce statistical descriptions of verbs according to these two distinct facets of meaning: In particular, I examine verbal subcategorisation, selectional preferences, and aspectual type. All three of these modelling strategies are evaluated on a common task, automatic verb classification. I demonstrate that automatically acquired features capturing verbal lexical aspect are beneficial for an application that concerns argument structure, namely semantic role labelling. Furthermore, I demonstrate that features capturing verbal argument structure perform well on the task of classifying a verb for its aspectual type. These findings suggest that these two facets of verb meaning are related in an underlying way

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    The Translation of Lexicalized Metaphors in Interlinguistic and Intercultural Communication of Financial Security Discourse: A Corpus-Based Analysis of English and Spanish Texts about Money Laundering

    Get PDF
    [EN]Financial crime is a significant factor in most transnational crime in general and is wide- reaching.Many critical stakeholders use specific metaphors in their communications to communicate security threats.Metaphors are often idiomatic speech that does not transfer easily from one language to another because they originate from cultural concepts. Within the public safety, regulatory and compliance community, key stakeholders from different linguistic backgrounds use English as a contact language to interact with their counterparts, the media, the public, and stakeholders to ensure regulatory compliance. Translating metaphors requires a special set of skills acquired through deep cultural knowledge and experience in both source and target cultures. The beginning of our research emanated from observing how language played a crucial role in relationships between everyone involved in the criminal justice process, not limited to the United States but also in a multitude of Spanish-speaking countries and geographical regions. Highly effective communication is critical for those who regulate against it, those involved in compliance initiatives, law enforcement, and the general public to better recognize and prevent money laundering. This project’s genesis came from interpreting criminal cases, translating documents in United States federal court cases, and observing how investigators followed the money trail to uncover illegal activity. The first-hand view of communications in that realm revealed how language played a crucial role in relationships between everyone involved in the criminal justice process, not only in the United States but also in many Spanish-speaking countries and geographical regions. Before this study, there has been little to no research on translating metaphors in the specialized regulatory financial compliance and enforcement language. The present study begins to fill that gap in research by providing a synchronic X-ray view of the current language spoken in that field through a corpus-based translation analysis of anti-money laundering texts. We developed a bilingual English- to-Spanish unidirectional corpus which we uploaded to Sketch Engine for analysis. Finally, we analyze and discuss translation techniques from English to Spanish and terminological findings. We found instances of intensifying metaphors from the source to target texts and adding or inserting metaphorical expressions in the target text where none were present in the source. We also found an ideological presence in translated expressions, consistent with other investigations involving security discourse. Finally, we found terminological inconsistencies in the metaphors for money laundering, tax haven, and shell company. We suggest practical implications for translators and stakeholders in the anti-money laundering discipline. We also provide pedagogical applications from custom building corpora and teaching translation of metaphors in the specialized financial regulation and compliance language. Developing specialized corpora and learning to use corpus-based translation analysis software will help translation students be better prepared for and improve the future of translation studies and their applications in specialized areas and beyond. Providing students with experience using linguistic analysis software will also help build critical technology skills that they will be able to apply across disciplines in the humanities and beyond, such as intelligence analysis and computer science. [ES]La delincuencia financiera es un factor relevante en la mayoría de los delitos transnacionales en general y tiene un gran alcance. Muchas personas interesadas utilizan metáforas específicas en sus comunicaciones para transmitir las amenazas a la seguridad. Las metáforas suelen ser expresiones idiomáticas que no se transmiten fácilmente de una lengua a otro debido a que tienen su origen en conceptos culturales. En lo que respecta a la seguridad pública, la reglamentación y el cumplimiento de la normativa, los principales interesados de diferentes orígenes lingüísticos utilizan el inglés como lengua de contacto para interactuar con sus homólogos, los medios de comunicación, el público y las partes interesadas para asegurar el cumplimiento de la normativa. La traducción de metáforas requiere un conjunto especial de habilidades adquiridas a través de un profundo conocimiento cultural y experiencia, tanto en la cultura de origen como en la de destino. El comienzo de nuestra investigación se debió a la observación de cómo el idioma desempeñaba un papel fundamental en las relaciones entre todos los implicados en el proceso de justicia penal, no solo en Estados Unidos, sino también en diversos países y regiones geográficas de habla hispana. Una comunicación altamente eficaz es esencial para que aquellos que regulan la lucha contra el blanqueo de capitales, quienes participan en iniciativas de cumplimiento de la normativa, las fuerzas y cuerpos de seguridad, así como el público en general, reconozcan y prevengan mejor el blanqueo de capitales. La génesis de este proyecto se remonta a la interpretación de causas penales, la traducción de documentos en casos de tribunales federales de Estados Unidos y la observación de cómo los investigadores seguían el rastro del dinero para descubrir actividades ilegales. La visión de primera mano de las comunicaciones en ese ámbito reveló cómo el idioma desempeñaba un papel fundamental en las relaciones entre todos los involucrados en el proceso de justicia penal, no solo en Estados Unidos, sino también en muchos países y regiones geográficas de habla hispana. Antes de este trabajo, apenas se había investigado la traducción de metáforas en el lenguaje especializado del cumplimiento y la aplicación de la normativa financiera. El presente estudio comienza a aclarar esa laguna en la investigación al ofrecer una radiografía sincrónica de la lengua que se habla actualmente en ese ámbito, a través de un análisis de la traducción de textos contra el blanqueo de capitales basado en un corpus. Desarrollamos un corpus unidireccional bilingüe inglés- español que hemos subido a Sketch Engine para su análisis. A continuación, se examinan y discuten las técnicas de traducción del inglés al español y los descubrimientos terminológicos. Encontramos casos en los que se intensifican las metáforas de los textos de origen a los de destino y se añaden o insertan expresiones metafóricas en el texto de destino en lugares en los que no se habían utilizado. Asimismo, observamos una presencia ideológica en las expresiones traducidas, de acuerdo con otras investigaciones sobre el discurso de la seguridad. Por último, nos encontramos con incongruencias terminológicas en las metáforas de blanqueo de capitales, paraíso fiscal y compañía de Shell. Nos sugerimos implicaciones prácticas para los traductores y las partes interesadas en la disciplina de la lucha contra el blanqueo de capitales. Asimismo, ofrecemos aplicaciones pedagógicas a través de la creación de corpus personalizados y la enseñanza de la traducción de metáforas en el lenguaje especializado de la regulación y el cumplimiento financiero. El desarrollo de corpus especializados y el aprendizaje de utilizar software de análisis de traducción basado en corpus ayudarán a los estudiantes de traducción a estar mejor preparados, así como también mejorarán el futuro de los estudios de traducción y sus aplicaciones en áreas especializadas y más allá. El brindar a los estudiantes experiencia en el uso de nuevos programas informáticos de análisis lingüístico también contribuirá a desarrollar aptitudes tecnológicas críticas que podrán aplicar en otras disciplinas de las humanidades y más allá, como el análisis de inteligencia y la informática

    X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

    Full text link
    Understanding when two pieces of text convey the same information is a goal touching many subproblems in NLP, including textual entailment and fact-checking. This problem becomes more complex when those two pieces of text are in different languages. Here, we introduce X-PARADE (Cross-lingual Paragraph-level Analysis of Divergences and Entailments), the first cross-lingual dataset of paragraph-level information divergences. Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language, indicating whether a given piece of information is the same, new, or new but can be inferred. This last notion establishes a link with cross-language NLI. Aligned paragraphs are sourced from Wikipedia pages in different languages, reflecting real information divergences observed in the wild. Armed with our dataset, we investigate a diverse set of approaches for this problem, including classic token alignment from machine translation, textual entailment methods that localize their decisions, and prompting of large language models. Our results show that these methods vary in their capability to handle inferable information, but they all fall short of human performance
    corecore