753 research outputs found

    Language technologies in speech-enabled second language learning games : from reading to dialogue

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 237-244).Second language learning has become an important societal need over the past decades. Given that the number of language teachers is far below demand, computer-aided language learning software is becoming a promising supplement to traditional classroom learning, as well as potentially enabling new opportunities for self-learning. The use of speech technologies is especially attractive to offer students unlimited chances for speaking exercises. To create helpful and intelligent speaking exercises on a computer, it is necessary for the computer to not only recognize the acoustics, but also to understand the meaning and give appropriate responses. Nevertheless, most existing speech-enabled language learning software focuses only on speech recognition and pronunciation training. Very few have emphasized exercising the student's composition and comprehension abilities and adopting language technologies to enable free-form conversation emulating a real human tutor. This thesis investigates the critical functionalities of a computer-aided language learning system, and presents a generic framework as well as various language- and domain-independent modules to enable building complex speech-based language learning systems. Four games have been designed and implemented using the framework and the modules to demonstrate their usability and flexibility, where dynamic content creation, automatic assessment, and automatic assistance are emphasized. The four games, reading, translation, question-answering and dialogue, offer different activities with gradually increasing difficulty, and involve a wide range of language processing techniques, such as language understanding, language generation, question generation, context resolution, dialogue management and user simulation. User studies with real subjects show that the systems were well received and judged to be helpful.by Yushi Xu.Ph.D

    Re-examining Phonological and Lexical Correlates of Second Language Comprehensibility:The Role of Rater Experience

    Get PDF
    Few researchers and teachers would disagree that some linguistic aspects of second language (L2) speech are more crucial than others for successful communication. Underlying this idea is the assumption that communicative success can be broadly defined in terms of speakers’ ability to convey the intended meaning to the interlocutor, which is frequently captured through a listener-based rating of comprehensibility or ease of understanding (e.g. Derwing & Munro, 2009; Levis, 2005). Previous research has shown that communicative success – for example, as defined through comprehensible L2 speech – depends on several linguistic dimensions of L2 output, including its segmental and suprasegmental pronunciation, fluency-based characteristics, lexical and grammatical content, as well as discourse structure (e.g. Field, 2005; Hahn, 2004; Kang et al., 2010; Trofimovich & Isaacs, 2012). Our chief objective in the current study was to explore the L2 comprehensibility construct from a language assessment perspective (e.g. Isaacs & Thomson, 2013), by targeting rater experience as a possible source of variance influencing the degree to which raters use various characteristics of speech in judging L2 comprehensibility. In keeping with this objective, we asked the following question: What is the extent to which linguistic aspects of L2 speech contributing to comprehensibility ratings depend on raters’ experience

    Pragmatics & Language Learning, Volume 12

    Get PDF
    Pragmatics & Language Learning Volume 12 examines the organization of second language and multilingual speakers’ talk and pragmatic knowledge across a range of naturalistic and experimental activities. Based on data collected on Danish, English, Hawaiʻi Creole, Indonesian, and Japanese as target languages, the contributions explore the nexus of pragmatic knowledge, interaction, and L2 learning outside and inside of educational settings. Pragmatics & Language Learning (“PLL”), a refereed series sponsored by the National Foreign Language Resource Center at the University of Hawaiʻi, publishes selected papers from the biennial Conference on International Pragmatics & Language Learning under the editorship of the conference hosts and the series editor, Gabriele Kasper

    Pragmatic Comprehension of English Refusals by Spanish-English Bilinguals

    Get PDF
    This study investigated the pragmatic comprehension and production of the speech act of refusals in English by a group of Spanish-English bilinguals (SEB) in comparison with native English speakers (NES), taking into account variables such as length of residency in the L2 environment, type of refusals and level of politeness. Other variables explored included speed of lexical access and working memory. SEB who learned English as adults were divided into two groups (short, long) according to their length of residency in an English language environment. All participants performed a Pragmatic Listening Task (PLT) and an oral production task both involving four types of refusals at three levels of politeness, as well as tasks related to working memory and speed of lexical access, and completed a language contact survey. Results showed that across groups the easiest types of refusal to comprehend were direct refusals, and indirect refusals with upgraders, followed by indirect refusals with downgraders, in turn followed by implicatures. SEB of both lengths of residency did not differ from NES in the comprehension of direct refusals and indirect refusals with upgraders, but SEB with short residencies had poorer comprehension than the NES on indirect refusals with downgraders and implicatures. Politeness systems affected comprehension of indirect refusals with downgraders and implicatures. NES were faster than both SEB groups across all refusal types, and direct refusals were comprehended faster than indirect refusals with downgraders, which in turn were comprehended faster than implicatures; indirect refusals with upgraders were also comprehended faster than implicatures. Production showed all groups mostly produced direct refusals. In addition, the SEB used indirect refusals with downgraders more frequently than the NES. In terms of cognition, NES were faster in lexical access and had better working memory than both SEB groups. For SEB with shorter residency, the faster their lexical access speed, the better their comprehension of indirect refusals with upgraders and implicatures and the higher their working memory, the better their comprehension of indirect refusals with upgraders. Thus, L2 learners are eventually able to master the pragmatics of refusals, but initially struggle with the more difficult types

    The effects of English proficiency on the processing of Bulgarian-accented English by Bulgarian-English bilinguals

    Get PDF
    This dissertation explores the potential benefit of listening to and with one’s first-language accent, as suggested by the Interspeech Intelligibility Benefit Hypothesis (ISIB). Previous studies have not consistently supported this hypothesis. According to major second language learning theories, the listener’s second language proficiency determines the extent to which the listener relies on their first language phonetics. Hence, this thesis provides a novel approach by focusing on the role of English proficiency in the understanding of Bulgarian-accented English for Bulgarian-English bilinguals. The first experiment investigated whether evoking the listeners’ L1 Bulgarian phonetics would improve the speed of processing Bulgarian-accented English words, compared to Standard British English words, and vice versa. Listeners with lower English proficiency processed Bulgarian-accented English faster than SBE, while high proficiency listeners tended to have an advantage with SBE over Bulgarian accent. The second experiment measured the accuracy and reaction times (RT) in a lexical decision task with single-word stimuli produced by two L1 English speakers and two Bulgarian-English bilinguals. Listeners with high proficiency in English responded slower and less accurately to Bulgarian-accented speech compared to L1 English speech and compared to lower proficiency listeners. These accent preferences were also supported by the listener’s RT adaptation across the first experimental block. A follow-up investigation compared the results of L1 UK English listeners to the bilingual listeners with the highest proficiency in English. The L1 English listeners and the bilinguals processed both accents with similar speed, accuracy and adaptation patterns, showing no advantage or disadvantage for the bilinguals. These studies support existing models of second language phonetics. Higher proficiency in L2 is associated with lesser reliance on L1 phonetics during speech processing. In addition, the listeners with the highest English proficiency had no advantage when understanding Bulgarian-accented English compared to L1 English listeners, contrary to ISIB. Keywords: Bulgarian-English bilinguals, bilingual speech processing, L2 phonetic development, lexical decision, proficienc

    Loan Phonology

    Get PDF
    For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native language’s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena

    Automatic Activation of Phonological Templates for Native but Not Nonnative Phonemes: An Investigation of the Temporal Dynamics of Mu Activation

    Get PDF
    Models of speech perception suggest a dorsal stream connecting the temporal and inferior parietal lobe with the inferior frontal gyrus. This stream is thought to involve an auditory-motor loop that translates acoustic information into motor/articulatory commands and is further influenced by decision making processes that involve maintenance of working memory or attention. Parsing out dorsal stream’s speech specific mechanisms from memory related ones in speech perception poses a complex problem. Here I argue that these processes may be disentangled from the viewpoint of the temporal dynamics of sensorimotor neural activation around a speech perception related event. Methods: Alpha (~10Hz) and beta (~20Hz) spectral components of the mu () rhythm, localized to sensorimotor regions, have been shown to index somatosensory and motor activity, respectively. In the present work, event related spectral perturbations (ERSP) of the EEG -rhythm were analyzed, while manipulating two factors: active/passive listening, and perception of native/nonnative phonemes. Active and passive speech perception tasks were used as indexes of memory load employed, while native and. nonnative perception were used as indexes of automatic top-down coding for sensory analysis. Results: Statistically significant differences were found in the oscillatory patterns of components between active and passive speech perception conditions with greater alpha and beta event related desynchronization (ERD) after stimuli offset in active speech perception. When compared to listening to noise, passive speech perception presented significantly (pFDR Conclusion: These findings suggest that neural processes within the dorsal auditory stream are functionally and automatically involved in speech perception mechanisms. While its early activity (shortly after stimuli onset) seems to be importantly involved with the instantiation of predictive motor/articulatory internal models that help constraining speech discrimination, its later activity (post-stimulus offset) seems essential in the maintenance of working memory processes

    Modeling DNN as human learner

    Get PDF
    In previous experiments, human listeners demonstrated that they had the ability to adapt to unheard, ambiguous phonemes after some initial, relatively short exposures. At the same time, previous work in the speech community has shown that pre-trained deep neural network-based (DNN) ASR systems, like humans, also have the ability to adapt to unseen, ambiguous phonemes after retuning their parameters on a relatively small set. In the first part of this thesis, the time-course of phoneme category adaptation in a DNN is investigated in more detail. By retuning the DNNs with more and more tokens with ambiguous sounds and comparing classification accuracy of the ambiguous phonemes in a held-out test across the time-course, we found out that DNNs, like human listeners, also demonstrated fast adaptation: the accuracy curves were step-like in almost all cases, showing very little adaptation after seeing only one (out of ten) training bins. However, unlike our experimental setup mentioned above, in a typical lexically guided perceptual learning experiment, listeners are trained with individual words instead of individual phones, and thus to truly model such a scenario, we would require a model that could take the context of a whole utterance into account. Traditional speech recognition systems accomplish this through the use of hidden Markov models (HMM) and WFST decoding. In recent years, bidirectional long short-term memory (Bi-LSTM) trained under connectionist temporal classification (CTC) criterion has also attracted much attention. In the second part of this thesis, previous experiments on ambiguous phoneme recognition were carried out again on a new Bi-LSTM model, and phonetic transcriptions of words ending with ambiguous phonemes were used as training targets, instead of individual sounds that consisted of a single phoneme. We found out that despite the vastly different architecture, the new model showed highly similar behavior in terms of classification rate over the time course of incremental retuning. This indicated that ambiguous phonemes in a continuous context could also be quickly adapted by neural network-based models. In the last part of this thesis, our pre-trained Dutch Bi-LSTM from the previous part was treated as a Dutch second language learner and was asked to transcribe English utterances in a self-adaptation scheme. In other words, we used the Dutch model to generate phonetic transcriptions directly and retune the model on the transcriptions it generated, although ground truth transcriptions were used to choose a subset of all self-labeled transcriptions. Self-adaptation is of interest as a model of human second language learning, but also has great practical engineering value, e.g., it could be used to adapt speech recognition to a lowr-resource language. We investigated two ways to improve the adaptation scheme, with the first being multi-task learning with articulatory feature detection during training the model on Dutch and self-labeled adaptation, and the second being first letting the model adapt to isolated short words before feeding it with longer utterances.Ope

    Is cue-based memory retrieval \u27good-enough\u27?: Agreement, comprehension, and implicit prosody in native and bilingual speakers of English

    Full text link
    This dissertation focuses on structural and prosodic effects during reading, examining their influence on agreement processing and comprehension in native English (L1) and Spanish-English bilingual (L2) speakers. I consolidate research from three distinct areas of inquiry\u27cognitive processing models, development of reading fluency, and L1/L2 processing strategies\u27and outline a cohesive and comprehensive processing model that can be applied to speakers regardless of language profile. This model is characterized by three critical components: a cognitive model of memory retrieval, a processing paradigm that outlines how resources may be deployed online, and the role of factors such as prosody in parsing decisions. The general framework of this integrated \u27Good-enough Cue\u27 (GC) model assumes the \u27Good-Enough\u27 Hypothesis and cue-based memory retrieval as central aspects. The \u27Good-Enough\u27 Hypothesis states that all speakers have access to two processing routes: a complete syntactic route, and a \u27good enough\u27 heuristic route (Ferreira, Bailey, & Ferraro, 2002; Ferreira, 2003). In the interest of conserving resources, speakers tend to rely more on heuristics and templates whenever the task allows, and may be required to rely on this fallback route when task demand is high. In the proposed GC model, cue-based memory retrieval (CBMR) is the instantiation of the complete syntactic route for agreement and long-distance dependencies in particular (Lewis & Vasishth, 2005; Wagers, Lau, & Phillips, 2009; Wagers, 2008). When retrieval fails using CBMR (due to cue overlap, memory trace decay, or some other factor), comprehenders may compensate by applying a \u27good-enough\u27 processing heuristic, which prioritizes general comprehension over detailed syntactic computation. Prosody (or implicit prosody) may reduce processing load by either facilitating syntactic processing or otherwise assisting memory retrieval, thus reducing reliance on the good-enough fallback route. This investigation explores how text presentation format interacts with these algorithmic versus heuristic processing strategies. Most specifically, measuring whether the presentation format of text affects readers\u27 comprehension and ability to detect subject-verb agreement errors in simple and complex relative clause constructions. The experimental design manipulated text presentation to influence implicit prosody, using sentences designed to induce subject-verb agreement attraction errors. Materials included simple and embedded relative clauses with head nouns and verbs that were either matched or mismatched for number. Participants read items in one of three presentation formats: a) whole sentence, b) word-by-word, or b) phrase-by-phrase, and rated each item for grammaticality and responded to a comprehension probe. Results indicate that while overall comprehension is typically prioritized over grammatical processing (following the \u27Good-Enough\u27 Hypothesis), the effects of presentation format are differentially influential based on group differences and processing measure. For the L1 participants, facilitating the projection of phrasal prosody (phrase-by-phrase presentation) onto text enhances performance in syntactic and grammatical processing, while disrupting it via a word-by-word presentation decreases comprehension accuracy. For the L2 participants however, phrase-by-phrase presentation is not significantly beneficial for grammatical processing\u27even resulting in a decrease in comprehension accuracy. These differences provide insight into the interaction of cognitive taskload, processing strategy selection, and the role of implicit prosody in reading fluency, building toward a comprehensive processing model for speakers of varying language profiles and proficiencies
    • 

    corecore