12 research outputs found

    Language models, surprisal and fantasy in Slavic intercomprehension

    Get PDF
    In monolingual human language processing, the predictability of a word given its surrounding sentential context is crucial. With regard to receptive multilingualism, it is unclear to what extent predictability in context interplays with other linguistic factors in understanding a related but unknown language – a process called intercomprehension. We distinguish two dimensions influencing processing effort during intercomprehension: surprisal in sentential context and linguistic distance. Based on this hypothesis, we formulate expectations regarding the difficulty of designed experimental stimuli and compare them to the results from think-aloud protocols of experiments in which Czech native speakers decode Polish sentences by agreeing on an appropriate translation. On the one hand, orthographic and lexical distances are reliable predictors of linguistic similarity. On the other hand, we obtain the predictability of words in a sentence with the help of trigram language models. We find that linguistic distance (encoding similarity) and in-context surprisal (predictability in context) appear to be complementary, with neither factor outweighing the other, and that our distinguishing of these two measurable dimensions is helpful in understanding certain unexpected effects in human behaviour

    On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

    Get PDF
    This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility

    Reading Polish with Czech Eyes: Distance and Surprisal in Quantitative, Qualitative, and Error Analyses of Intelligibility

    Get PDF
    In CHAPTER I, I first introduce the thesis in the context of the project workflow in section 1. I then summarise the methods and findings from the project publications about the languages in focus. There I also introduce the relevant concepts and terminology viewed in the literature as possible predictors of intercomprehension and processing difficulty. CHAPTER II presents a quantitative (section 4) and a qualitative (section 5) analysis of the results of the cooperative translation experiments. The focus of this thesis – the language pair PL-CS – is explained and the hypotheses are introduced in section 6. The experiment website is introduced in section 7 with an overview over participants, the different experiments conducted and in which section they are discussed. In CHAPTER IV, free translation experiments are discussed in which two different sets of individual word stimuli were presented to Czech readers: (i) Cognates that are transformable with regular PL-CS correspondences (section 12) and (ii) the 100 most frequent PL nouns (section 13). CHAPTER V presents the findings of experiments in which PL NPs in two different linearisation conditions were presented to Czech readers (section 14.1-14.6). A short digression is made when I turn to experiments with PL internationalisms which were presented to German readers (14.7). CHAPTER VI discusses the methods and results of cloze translation experiments with highly predictable target words in sentential context (section 15) and random context with sentences from the cooperative translation experiments (section 16). A final synthesis of the findings, together with an outlook, is provided in CHAPTER VII.In KAPITEL I stelle ich zunächst die These im Kontext des Projektablaufs in Abschnitt 1 vor. Anschließend fasse ich die Methoden und Erkenntnisse aus den Projektpublikationen zu den untersuchten Sprachen zusammen. Dort stelle ich auch die relevanten Konzepte und die Terminologie vor, die in der Literatur als mögliche Prädiktoren für Interkomprehension und Verarbeitungsschwierigkeiten angesehen werden. KAPITEL II enthält eine quantitative (Abschnitt 4) und eine qualitative (Abschnitt 5) Analyse der Ergebnisse der kooperativen Übersetzungsexperimente. Der Fokus dieser Arbeit - das Sprachenpaar PL-CS - wird erläutert und die Hypothesen werden in Abschnitt 6 vorgestellt. Die Experiment-Website wird in Abschnitt 7 mit einer Übersicht über die Teilnehmer, die verschiedenen durchgeführten Experimente und die Abschnitte, in denen sie besprochen werden, vorgestellt. In KAPITEL IV werden Experimente zur freien Übersetzung besprochen, bei denen tschechischen Lesern zwei verschiedene Sätze einzelner Wortstimuli präsentiert wurden: (i) Kognaten, die mit regulären PL-CS-Korrespondenzen umgewandelt werden können (Abschnitt 12) und (ii) die 100 häufigsten PL-Substantive (Abschnitt 13). KAPITEL V stellt die Ergebnisse von Experimenten vor, in denen tschechischen Lesern PL-NP in zwei verschiedenen Linearisierungszuständen präsentiert wurden (Abschnitt 14.1-14.6). Einen kurzen Exkurs mache ich, wenn ich mich den Experimenten mit PL-Internationalismen zuwende, die deutschen Lesern präsentiert wurden (14.7). KAPITEL VI erörtert die Methoden und Ergebnisse von Lückentexten mit hochgradig vorhersehbaren Zielwörtern im Satzkontext (Abschnitt 15) und Zufallskontext mit Sätzen aus den kooperativen Übersetzungsexperimenten (Abschnitt 16). Eine abschließende Synthese der Ergebnisse und ein Ausblick finden sich in KAPITEL VII

    Intercomprehension in the Teaching of Polish as a Foreign Language to East Slavic Speakers

    Get PDF
    The article is devoted to the learning and teaching of Polish as a foreign language based on the lexical and grammatical similarities between Polish and East Slavic languages. After an introduction to the theoretical scope of the research, the article presents the results of an empirical project aiming to describe the acquisition sequence of selected Polish morphosyntactic structures and identify the effect of the learners’ L1 (Slavic vs. non-Slavic). The discussion pays particular attention to the language teaching application of the acquisitional insights produced by the research. In particular, questions related to the exploitation of cross-linguistic transfer to enhance the acquisition process are discussed in relation to the need to facilitate the linguistic integration of Ukrainian refugees into Polish society in the context of the ongoing humanitarian crisis.Tematem artykułu jest nauczanie JPJO w oparciu o podobieństwa leksykalne i gramatyczne pomiędzy językiem polskim i językami wschodniosłowiańskimi w celu ułatwienia i przyspieszenia procesu akwizycji. Po wprowadzeniu do teoretycznego zakresu badań przedstawiają się autorską koncepcję psycholingwistycznych badań empirycznych oraz wyniki projektu badawczego, prezentującego opis sekwencji akwizycji wybranych elementów polskiej morfoskładni. Szczególną uwagę poświęcono implikacjom glottodydaktycznym przedstawionych wyników badań. Szczegółowo omawiana jest koncepcja programów nauczania i materiałów do szybkiego nauczania JPJO dla Słowian wschodnich by ułatwić integrację językową we społeczeństwie polskim w kontekście obecnego kryzysu humanitarnego w Ukrainie

    Lexical comprehension within and across sign languages of Belgium, China and the Netherlands

    Get PDF
    There are hundreds of known sign languages around the world today, distinct languages each with its own historical and cultural context. Nevertheless, it is well known among signers who move through international spaces and across signing communities that a certain degree of mutual intelligibility is achievable during so-called cross-signing, even between historically unrelated sign languages. This has been explained by shared experiences, translanguaging competence and a higher degree of iconicity in the lexicons of sign languages. In this paper, I investigate one aspect of mutual intelligibility between four different sign languages: Sign Language of the Netherlands (NGT), Flemish Sign Language (VGT), French-Belgian Sign Language (LSFB) and Chinese Sign Language (CSL). Through a comprehension task with NGT signs, I analyze how accurately signers of the four sign languages identify NGT signs in an experimental sign-to-picture matching task, matching one target sign to one of four meaning choices: one target meaning and three distractors based on either form-similarity or plausible iconicity-mapping to the target sign. The results show that signers of VGT and LSFB perform better than CSL signers on this task, which may be attributed to lexical overlap, shared iconic mappings and experiences, as well as language contact due to geographic proximity. It is found that misidentification of target meanings is mostly caused by distractors with iconically plausible mappings between form and meaning. Across the four languages, signers’ self-evaluations of their performance on the lexical comprehension task correlate with test scores, demonstrating that they generally judge their level of comprehension accurately.publishedVersio

    La intercomprensión oral entre el español y el portugués en actos de habla informales. Creencias lingüísticas del estudiantado de la Universidad de Extremadura y la Universidade de Évora

    Get PDF
    Mejor TFG del curso académico 2019-2020 (ex aequo). Tutora: María del Carmen Méndez Santos. Línea: Lingüística General. Fecha de la defensa: 25 de junio de 2020.Los hablantes de español y los de portugués, como los de otras lenguas emparentadas, tienen la capacidad de desarrollar las competencias receptivas necesarias para entenderse mutuamente, sin necesidad de aprender el otro idioma, utilizando cada uno el suyo. Esta estrategia comunicativa, que se ha denominado "intercomprensión", puede resultar eficiente entre idiomas con un origen común, además de ser igualitaria para las lenguas, ya que ninguna ejerce dominio sobre la otra. En este estudio, se investiga si la intercomprensión oral podría facilitar, en efecto, la comunicación entre hispanohablantes y lusófonos, concretamente en actos de habla informales. Para conseguir este objetivo, se han recogido las creencias lingüísticas de 150 estudiantes de la Universidad de Extremadura y de la Universidade de Évora, ubicadas cerca de la frontera hispano-lusa. Las respuestas muestran opiniones favorables sobre la eficiencia de la intercomprensión y señalan, además, los aspectos positivos y negativos de este modo de comunicación bilingüe.Els parlants d’espanyol i els de portugués, com els d’altres llengües emparentades, tenen la capacitat de desenvolupar les competències receptives necessàries per a entendre’s mútuament, sense necessitat d’aprendre l’altre idioma i utilitzant cadascun el seu. Aquesta estratègia comunicativa, que s’ha denominat «intercomprensió», pot resultar eficient entre idiomes amb un origen comú, a més de ser igualitària per a les llengües, ja que ninguna domina l’altra. En aquest estudi, s’investiga si la intercomprensió oral podria facilitar, en efecte, la comunicació entre hispanoparlants i lusòfons, concretament en actes de parla informals. Per a aconseguir aquest objectiu, s'han recollit les creences lingüístiques de 150 estudiants de la Universitat d’Extremadura i de la Universidade de Évora, ubicades prop de la frontera hispano-lusitana. Les respostes mostren opinions favorables sobre l'eficiència de la intercomprensió i assenyalen, a més, els aspectes positius i negatius d'aquest mode de comunicació bilingüe.Both Spanish and Portuguese speakers have the ability to develop the necessary receptive competences to communicate with each other by using their own languages, without the need to learn the other one. This way of communication, called “intercomprehension”, can be efficient between related languages, in addition to being equitable for them, as languages do not dominate over the other. This study investigates whether oral intercomprehension could facilitate communication between Spanish and Portuguese speakers, specifically in informal speech acts. Linguistic attitudes of 150 students of the University of Extremadura and the University of Évora, located next to the Spanish-Portuguese border, have been collected to achieve this goal. The answers show favorable opinions on the efficiency of intercomprehension and also the positive and negative aspects of this way of communication

    Zur Rolle der Orthographie in der slavischen Interkomprehension mit besonderem Fokus auf die kyrillische Schrift

    Get PDF
    Die slavischen Sprachen stellen einen bedeutenden indogermanischen Sprachzweig dar. Es stellt sich die Frage, inwieweit sich Sprecher verschiedener slavischer Sprachen interkomprehensiv verständigen können. Unter Interkomprehension wird die Kommunikationsfähigkeit von Sprechern verwandter Sprachen verstanden, wobei sich jeder Sprecher seiner Sprache bedient. Die vorliegende Arbeit untersucht die orthographische Verständlichkeit slavischer Sprachen mit kyrillischer Schrift im interkomprehensiven Lesen. Sechs ost- und südslavische Sprachen – Bulgarisch, Makedonisch, Russisch, Serbisch, Ukrainisch und Weißrussisch – werden im Hinblick auf orthographische Ähnlichkeiten und Unterschiede miteinander verglichen und statistisch analysiert. Der Fokus der empirischen Untersuchung liegt auf der Erkennung einzelner Kognaten mit diachronisch motivierten orthographischen Korrespondenzen in ost- und südslavischen Sprachen, ausgehend vom Russischen. Die in dieser Arbeit vorgestellten Methoden und erzielten Ergebnisse stellen einen empirischen Beitrag zur slavischen Interkomprehensionsforschung und Interkomrepehensionsdidaktik dar.Slavonic languages represent an important branch within the Indogermanic languages. Therefore it ist important to know in how far, speakers of different Slavonic languages are capable of communicating in an intercomprehensive way. Intercomprehension is the ability of speakers of affiliated languages to communicate with each other. The present work examines orthographic intercomprehension of Slavonic languages using the cyrillic alphabet with respect to reading. Six East- and South-Slavonic languages – Bulgarian, Macedonian, Russian, Serbian, Ukrainian and Belarusian – are compared statistically and analysed with respect to orthographic similarities and differences. All methods and results of this study represent an empirical contribution to Slavonic intercomprehension studies and didactics

    Zur Rolle der Orthographie in der slavischen Interkomprehension mit besonderem Fokus auf die kyrillische Schrift

    Get PDF
    Die slavischen Sprachen stellen einen bedeutenden indogermanischen Sprachzweig dar. Es stellt sich die Frage, inwieweit sich Sprecher verschiedener slavischer Sprachen interkomprehensiv verständigen können. Unter Interkomprehension wird die Kommunikationsfähigkeit von Sprechern verwandter Sprachen verstanden, wobei sich jeder Sprecher seiner Sprache bedient. Die vorliegende Arbeit untersucht die orthographische Verständlichkeit slavischer Sprachen mit kyrillischer Schrift im interkomprehensiven Lesen. Sechs ost- und südslavische Sprachen Bulgarisch, Makedonisch, Russisch, Serbisch, Ukrainisch und Weißrussisch werden im Hinblick auf orthographische Ähnlichkeiten und Unterschiede miteinander verglichen und statistisch analysiert. Der Fokus der empirischen Untersuchung liegt auf der Erkennung einzelner Kognaten mit diachronisch motivierten orthographischen Korrespondenzen in ost- und südslavischen Sprachen, ausgehend vom Russischen. Die in dieser Arbeit vorgestellten Methoden und erzielten Ergebnisse stellen einen empirischen Beitrag zur slavischen Interkomprehensionsforschung und Interkomrepehensionsdidaktik dar.Slavonic languages represent an important branch within the Indogermanic languages. Therefore it ist important to know in how far, speakers of different Slavonic languages are capable of communicating in an intercomprehensive way. Intercomprehension is the ability of speakers of affiliated languages to communicate with each other. The present work examines orthographic intercomprehension of Slavonic languages using the cyrillic alphabet with respect to reading. Six East- and South-Slavonic languages Bulgarian, Macedonian, Russian, Serbian, Ukrainian and Belarusian are compared statistically and analysed with respect to orthographic similarities and differences. All methods and results of this study represent an empirical contribution to Slavonic intercomprehension studies and didactics

    Comparing the production of a formula with the development of L2 competence

    Get PDF
    This pilot study investigates the production of a formula with the development of L2 competence over proficiency levels of a spoken learner corpus. The results show that the formula in beginner production data is likely being recalled holistically from learners’ phonological memory rather than generated online, identifiable by virtue of its fluent production in absence of any other surface structure evidence of the formula’s syntactic properties. As learners’ L2 competence increases, the formula becomes sensitive to modifications which show structural conformity at each proficiency level. The transparency between the formula’s modification and learners’ corresponding L2 surface structure realisations suggest that it is the independent development of L2 competence which integrates the formula into compositional language, and ultimately drives the SLA process forward

    Medidas de distância entre línguas baseadas em corpus. Aplicação à linguística histórica do galego, português, espanhol e inglês

    Get PDF
    xiv, 225 p.El objetivo de esta tesis es plantear y verificar una metodología basada en corpus paracuantificar automáticamente la distancia entre lenguas y variantes de lenguas. Para ello se ha partido de las técnicas usadas y contrastadas en identificación de idiomas, buscando aquellasque son más robustas y pueden cuantificar cuánto se acerca un texto a un modelo de lenguaje.También como objetivo secundario hemos investigado el papel que juega la ortografía comofactor de divergencia y convergencia entre lenguas.El método elegido es no-supervisado y puede aplicarse al cálculo de la distancia entre idiomas,entre períodos históricos de lenguas o entre variantes de lenguas
    corecore