150 research outputs found

    Hard Non-Monotonic Attention for Character-Level Transduction

    Full text link
    Character-level string-to-string transduction is an important component of various NLP tasks. The goal is to map an input string to an output string, where the strings may be of different lengths and have characters taken from different alphabets. Recent approaches have used sequence-to-sequence models with an attention mechanism to learn which parts of the input string the model should focus on during the generation of the output string. Both soft attention and hard monotonic attention have been used, but hard non-monotonic attention has only been used in other sequence modeling tasks such as image captioning and has required a stochastic approximation to compute the gradient. In this work, we introduce an exact, polynomial-time algorithm for marginalizing over the exponential number of non-monotonic alignments between two strings, showing that hard attention models can be viewed as neural reparameterizations of the classical IBM Model 1. We compare soft and hard non-monotonic attention experimentally and find that the exact algorithm significantly improves performance over the stochastic approximation and outperforms soft attention.Comment: Published in EMNLP 201

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

    Get PDF
    This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists

    Biliteracy and acquisition of novel written words: the impact of phonological conflict between L1 and L2 scripts

    Get PDF
    The acquisition of new orthographic representations is a rapid and accurate process in proficient monolingual readers. The present study used biliterate and bialphabetic population to address the impact of phonological inconsistencies across the native (L1) and second (L2) alphabets. Naming latencies were collected from 50 Russian-English biliterates through a reading-aloud task with familiar and novel word forms repeated across 10 blocks. There were three Script conditions: (1) native Cyrillic, (2) non-native Roman, and (3) Ambiguous (with graphically identical, but phonologically inconsistent graphemes shared by both alphabets). Our analysis revealed the main effect of Script on both reading and orthographic learning: naming latencies during training were longer for the ambiguous stimuli, particularly for the novel ones. Nonetheless, novel word forms in the ambiguous condition approached the latencies for the familiar words along the exposures, although this effect was faster in the phonologically consistent trials. Post-training tests revealed similarly successful performance patterns for previously familiar and newly trained forms, indicating successful rapid acquisition of the latter. Furthermore, we found the highest free recall rates for the ambiguous stimuli. Overall, our results indicate that phonological inconsistency initially interferes with the efficiency of novel word encoding. Nevertheless, it does not prevent efficient attribution of orthographic representations; instead, the knowledge of two distinct alphabets supports a more efficient learning and a better memory for ambiguous stimuli via enhancing their encoding and retrieval

    The effect of orthographic systems on the developing reading system:Typological and computational analyses

    Get PDF
    Orthographic systems vary dramatically in the extent to which they encode a language’s phonological and lexico-semantic structure. Studies of the effects of orthographic transparency suggest that such variation is likely to have major implications for how the reading system operates. However, such studies have been unable to examine in isolation the contributory effect of transparency on reading because of covarying linguistic or sociocultural factors. We first investigated the phonological properties of languages using the range of the world’s orthographic systems (alphabetic, alphasyllabic, consonantal, syllabic, and logographic), and found that, once geographical proximity is taken into account, phonological properties do not relate to orthographic system. We then explored the processing implications of orthographic variation by training a connectionist implementation of the triangle model of reading on the range of orthographic systems while controlling for phonological and semantic structure. We show that the triangle model is effective as a universal model of reading, able to replicate key behavioral and neuroscientific results. The model also generates new predictions deriving from an explicit description of the effects of orthographic transparency on how reading is realized and defines the consequences of orthographic systems on reading processes. (PsycInfo Database Record (c) 2020 APA, all rights reserved

    Single-word naming in a transparent alphabetic orthography.

    Get PDF
    The cognitive processes involved in single-word naming of the transparent Turkish orthography were examined in a series of nine naming experiments on adult native readers. In Experiment 1, a significant word frequency effect was observed when matched (i.e. on initial phoneme, letter length and number of syllables) high- and low-frequency words were presented for naming. However, no frequency effect was found in Experiment 2, when an equal number of matched (i.e. on initial phoneme, letter length and number of syllables) nonword fillers were mixed with the target words. A null frequency effect was also found in Experiment 3 when conditions were mixed-blocks, i.e. high- and low frequency were words presented in separate blocks mixed with an equal number of matched nonword fillers. Experiment 4 served the purpose of creating and validating nonwords (to be used in Experiments 5 and 6) that could be named as fast as high- and low-frequency words by manipulating the letter length of nonwords. A significant word frequency effect emerged with both the mixed-block design (Experiment 5) and mixed design (Experiment 6) when the nonword fillers matched the target words in speed of naming. Experiment 7, however, found no frequency effect when high- and low-frequency words were mixed with word fillers that were slower to be named (longer in length) than the target words. In Experiment 8, frequency was factorially manipulated with imageability (high vs. low) and level of skill (very skilled vs. skilled) which found significant main effects for word frequency and level of skill, and a significant 2-way interaction of skill by imageability and a significant 3-way interaction of skill by imageability by frequency. In Experiment 9, however, there was only a main effect for frequency when previously skilled readers performed on the same words used in Experiment 8. These findings suggest that whilst a lexical route dominates in naming the transparent Turkish orthography, an explanation that the readers shut down the operation of this route in the presence of nonword fillers is not entertained. Instead, the results suggest that both routes operate in naming, with the inclusion of filler stimuli and their “perceived difficulty” having an impact in the time criterion for articulation. Moreover, there are indications that a semantic route is involved in naming Turkish only when level of skill is taken into account. Implications of these findings for models of single-word naming are discussed

    L1 Impacts on L2 Component Reading Skills, Word Skills, and Overall Reading Achievement

    Get PDF
    Learning to read in a second language as an adult is different in many ways from learning to read in a first language. Unlike children, adult second language (L2) learners have limited knowledge of the target language but may already have fluent reading skills in their first language (L1). These initial reading skills develop to be specifically tuned to the characteristics of the L1 writing system, and may not be optimized for literacy in the L2 (e.g., Frost, 2012; Koda, 2004). This dissertation consists of a program of research designed to examine the impacts that these L1 writing system characteristics have on the development of literacy skills in English as a second language (ESL). Study 1 examined performance on two fundamental literacy skills, phonological awareness and orthographic knowledge, as a function of L1 background and task demands. These data were collected abroad from native French, Hebrew, and Mandarin Chinese speakers, as well as native English speakers, and show clear influences of both L1 orthography and phonology on literacy skill performance. The large differences in performance associated with varying task demands have implications for accurately measuring and understanding students’ underlying abilities. Study 2 examined the contributions of phonological awareness and orthographic knowledge to three measures of word identification: lexical decision, word naming, and pseudoword decoding, as well as global reading comprehension. These data reveal differential performance on the word identification tasks across L1s, as well as differential contributions of phonological awareness and orthographic knowledge to word identification. Study 2 again revealed the effects of task demands on the relationships between sub-lexical literacy skills and word identification. Finally, Study 3 examined the development of language and literacy in adult ESL classroom learners who received either traditional reading instruction or a set of supplemental lessons providing a phonics-based instructional intervention. The results show influences of L1 background as well as different developmental patterns for phonological and orthographic skills based on the type of curriculum students received. The discussion highlights the contributions of this work to understanding cross-linguistic literacy skills and the importance of considering task demands when choosing language assessment measures
    • …
    corecore