753 research outputs found
Language technologies in speech-enabled second language learning games : from reading to dialogue
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 237-244).Second language learning has become an important societal need over the past decades. Given that the number of language teachers is far below demand, computer-aided language learning software is becoming a promising supplement to traditional classroom learning, as well as potentially enabling new opportunities for self-learning. The use of speech technologies is especially attractive to offer students unlimited chances for speaking exercises. To create helpful and intelligent speaking exercises on a computer, it is necessary for the computer to not only recognize the acoustics, but also to understand the meaning and give appropriate responses. Nevertheless, most existing speech-enabled language learning software focuses only on speech recognition and pronunciation training. Very few have emphasized exercising the student's composition and comprehension abilities and adopting language technologies to enable free-form conversation emulating a real human tutor. This thesis investigates the critical functionalities of a computer-aided language learning system, and presents a generic framework as well as various language- and domain-independent modules to enable building complex speech-based language learning systems. Four games have been designed and implemented using the framework and the modules to demonstrate their usability and flexibility, where dynamic content creation, automatic assessment, and automatic assistance are emphasized. The four games, reading, translation, question-answering and dialogue, offer different activities with gradually increasing difficulty, and involve a wide range of language processing techniques, such as language understanding, language generation, question generation, context resolution, dialogue management and user simulation. User studies with real subjects show that the systems were well received and judged to be helpful.by Yushi Xu.Ph.D
Re-examining Phonological and Lexical Correlates of Second Language Comprehensibility:The Role of Rater Experience
Few researchers and teachers would disagree that some linguistic aspects
of second language (L2) speech are more crucial than others for successful
communication. Underlying this idea is the assumption that communicative
success can be broadly defined in terms of speakersâ ability to convey the
intended meaning to the interlocutor, which is frequently captured through
a listener-based rating of comprehensibility or ease of understanding (e.g.
Derwing & Munro, 2009; Levis, 2005). Previous research has shown that
communicative success â for example, as defined through comprehensible L2
speech â depends on several linguistic dimensions of L2 output, including its
segmental and suprasegmental pronunciation, fluency-based characteristics,
lexical and grammatical content, as well as discourse structure (e.g. Field,
2005; Hahn, 2004; Kang et al., 2010; Trofimovich & Isaacs, 2012). Our chief
objective in the current study was to explore the L2 comprehensibility construct from a language assessment perspective (e.g. Isaacs & Thomson, 2013),
by targeting rater experience as a possible source of variance influencing the
degree to which raters use various characteristics of speech in judging L2
comprehensibility. In keeping with this objective, we asked the following
question: What is the extent to which linguistic aspects of L2 speech contributing to comprehensibility ratings depend on ratersâ experience
Pragmatics & Language Learning, Volume 12
Pragmatics & Language Learning Volume 12 examines the organization of second language and multilingual speakersâ talk and pragmatic knowledge across a range of naturalistic and experimental activities. Based on data collected on Danish, English, HawaiÊ»i Creole, Indonesian, and Japanese as target languages, the contributions explore the nexus of pragmatic knowledge, interaction, and L2 learning outside and inside of educational settings. Pragmatics & Language Learning (âPLLâ), a refereed series sponsored by the National Foreign Language Resource Center at the University of HawaiÊ»i, publishes selected papers from the biennial Conference on International Pragmatics & Language Learning under the editorship of the conference hosts and the series editor, Gabriele Kasper
Pragmatic Comprehension of English Refusals by Spanish-English Bilinguals
This study investigated the pragmatic comprehension and production of the speech act of refusals in English by a group of Spanish-English bilinguals (SEB) in comparison with native English speakers (NES), taking into account variables such as length of residency in the L2 environment, type of refusals and level of politeness. Other variables explored included speed of lexical access and working memory. SEB who learned English as adults were divided into two groups (short, long) according to their length of residency in an English language environment. All participants performed a Pragmatic Listening Task (PLT) and an oral production task both involving four types of refusals at three levels of politeness, as well as tasks related to working memory and speed of lexical access, and completed a language contact survey. Results showed that across groups the easiest types of refusal to comprehend were direct refusals, and indirect refusals with upgraders, followed by indirect refusals with downgraders, in turn followed by implicatures. SEB of both lengths of residency did not differ from NES in the comprehension of direct refusals and indirect refusals with upgraders, but SEB with short residencies had poorer comprehension than the NES on indirect refusals with downgraders and implicatures. Politeness systems affected comprehension of indirect refusals with downgraders and implicatures. NES were faster than both SEB groups across all refusal types, and direct refusals were comprehended faster than indirect refusals with downgraders, which in turn were comprehended faster than implicatures; indirect refusals with upgraders were also comprehended faster than implicatures. Production showed all groups mostly produced direct refusals. In addition, the SEB used indirect refusals with downgraders more frequently than the NES. In terms of cognition, NES were faster in lexical access and had better working memory than both SEB groups. For SEB with shorter residency, the faster their lexical access speed, the better their comprehension of indirect refusals with upgraders and implicatures and the higher their working memory, the better their comprehension of indirect refusals with upgraders. Thus, L2 learners are eventually able to master the pragmatics of refusals, but initially struggle with the more difficult types
The effects of English proficiency on the processing of Bulgarian-accented English by Bulgarian-English bilinguals
This dissertation explores the potential benefit of listening to and with oneâs first-language accent, as suggested by the Interspeech Intelligibility Benefit Hypothesis (ISIB). Previous studies have not consistently supported this hypothesis. According to major second language learning theories, the listenerâs second language proficiency determines the extent to which the listener relies on their first language phonetics. Hence, this thesis provides a novel approach by focusing on the role of English proficiency in the understanding of Bulgarian-accented English for Bulgarian-English bilinguals.
The first experiment investigated whether evoking the listenersâ L1 Bulgarian phonetics would improve the speed of processing Bulgarian-accented English words, compared to Standard British English words, and vice versa. Listeners with lower English proficiency processed Bulgarian-accented English faster than SBE, while high proficiency listeners tended to have an advantage with SBE over Bulgarian accent.
The second experiment measured the accuracy and reaction times (RT) in a lexical decision task with single-word stimuli produced by two L1 English speakers and two Bulgarian-English bilinguals. Listeners with high proficiency in English responded slower and less accurately to Bulgarian-accented speech compared to L1 English speech and compared to lower proficiency listeners. These accent preferences were also supported by the listenerâs RT adaptation across the first experimental block.
A follow-up investigation compared the results of L1 UK English listeners to the bilingual listeners with the highest proficiency in English. The L1 English listeners and the bilinguals processed both accents with similar speed, accuracy and adaptation patterns, showing no advantage or disadvantage for the bilinguals.
These studies support existing models of second language phonetics. Higher proficiency in L2 is associated with lesser reliance on L1 phonetics during speech processing. In addition, the listeners with the highest English proficiency had no advantage when understanding Bulgarian-accented English compared to L1 English listeners, contrary to ISIB.
Keywords:
Bulgarian-English bilinguals, bilingual speech processing, L2 phonetic development, lexical decision, proficienc
Loan Phonology
For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native languageâs sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena
Automatic Activation of Phonological Templates for Native but Not Nonnative Phonemes: An Investigation of the Temporal Dynamics of Mu Activation
Models of speech perception suggest a dorsal stream connecting the temporal and inferior parietal lobe with the inferior frontal gyrus. This stream is thought to involve an auditory-motor loop that translates acoustic information into motor/articulatory commands and is further influenced by decision making processes that involve maintenance of working memory or attention. Parsing out dorsal streamâs speech specific mechanisms from memory related ones in speech perception poses a complex problem. Here I argue that these processes may be disentangled from the viewpoint of the temporal dynamics of sensorimotor neural activation around a speech perception related event.
Methods: Alpha (~10Hz) and beta (~20Hz) spectral components of the mu () rhythm, localized to sensorimotor regions, have been shown to index somatosensory and motor activity, respectively. In the present work, event related spectral perturbations (ERSP) of the EEG -rhythm were analyzed, while manipulating two factors: active/passive listening, and perception of native/nonnative phonemes. Active and passive speech perception tasks were used as indexes of memory load employed, while native and. nonnative perception were used as indexes of automatic top-down coding for sensory analysis.
Results: Statistically significant differences were found in the oscillatory patterns of components between active and passive speech perception conditions with greater alpha and beta event related desynchronization (ERD) after stimuli offset in active speech perception. When compared to listening to noise, passive speech perception presented significantly (pFDR
Conclusion: These findings suggest that neural processes within the dorsal auditory stream are functionally and automatically involved in speech perception mechanisms. While its early activity (shortly after stimuli onset) seems to be importantly involved with the instantiation of predictive motor/articulatory internal models that help constraining speech discrimination, its later activity (post-stimulus offset) seems essential in the maintenance of working memory processes
Modeling DNN as human learner
In previous experiments, human listeners demonstrated that they had the ability to adapt to
unheard, ambiguous phonemes after some initial, relatively short exposures. At the same time,
previous work in the speech community has shown that pre-trained deep neural network-based
(DNN) ASR systems, like humans, also have the ability to adapt to unseen, ambiguous phonemes
after retuning their parameters on a relatively small set. In the first part of this thesis, the time-course
of phoneme category adaptation in a DNN is investigated in more detail. By retuning the
DNNs with more and more tokens with ambiguous sounds and comparing classification accuracy
of the ambiguous phonemes in a held-out test across the time-course, we found out that DNNs, like
human listeners, also demonstrated fast adaptation: the accuracy curves were step-like in almost
all cases, showing very little adaptation after seeing only one (out of ten) training bins. However,
unlike our experimental setup mentioned above, in a typical
lexically guided perceptual learning
experiment, listeners are trained with individual words instead of individual phones, and thus to truly
model such a scenario, we would require a model that could take the context of a whole utterance
into account. Traditional speech recognition systems accomplish this through the use of hidden
Markov models (HMM) and WFST decoding. In recent years, bidirectional long short-term memory (Bi-LSTM) trained under connectionist temporal classification (CTC) criterion has also attracted
much attention. In the second part of this thesis, previous experiments on ambiguous phoneme
recognition were carried out again on a new Bi-LSTM model, and phonetic transcriptions of words
ending with ambiguous phonemes were used as training targets, instead of individual sounds that
consisted of a single phoneme. We found out that despite the vastly different architecture, the
new model showed highly similar behavior in terms of classification rate over the time course of
incremental retuning. This indicated that ambiguous phonemes in a continuous context could also
be quickly adapted by neural network-based models. In the last part of this thesis, our pre-trained
Dutch Bi-LSTM from the previous part was treated as a Dutch second language learner and was
asked to transcribe English utterances in a self-adaptation scheme. In other words, we used the
Dutch model to generate phonetic transcriptions directly and retune the model on the transcriptions
it generated, although ground truth transcriptions were used to choose a subset of all self-labeled
transcriptions. Self-adaptation is of interest as a model of human second language learning, but also
has great practical engineering value, e.g., it could be used to adapt speech recognition to a lowr-resource
language. We investigated two ways to improve the adaptation scheme, with the first being
multi-task learning with articulatory feature detection during training the model on Dutch and self-labeled
adaptation, and the second being first letting the model adapt to isolated short words before
feeding it with longer utterances.Ope
Is cue-based memory retrieval \u27good-enough\u27?: Agreement, comprehension, and implicit prosody in native and bilingual speakers of English
This dissertation focuses on structural and prosodic effects during reading, examining their influence on agreement processing and comprehension in native English (L1) and Spanish-English bilingual (L2) speakers. I consolidate research from three distinct areas of inquiry\u27cognitive processing models, development of reading fluency, and L1/L2 processing strategies\u27and outline a cohesive and comprehensive processing model that can be applied to speakers regardless of language profile. This model is characterized by three critical components: a cognitive model of memory retrieval, a processing paradigm that outlines how resources may be deployed online, and the role of factors such as prosody in parsing decisions.
The general framework of this integrated \u27Good-enough Cue\u27 (GC) model assumes the \u27Good-Enough\u27 Hypothesis and cue-based memory retrieval as central aspects. The \u27Good-Enough\u27 Hypothesis states that all speakers have access to two processing routes: a complete syntactic route, and a \u27good enough\u27 heuristic route (Ferreira, Bailey, & Ferraro, 2002; Ferreira, 2003). In the interest of conserving resources, speakers tend to rely more on heuristics and templates whenever the task allows, and may be required to rely on this fallback route when task demand is high. In the proposed GC model, cue-based memory retrieval (CBMR) is the instantiation of the complete syntactic route for agreement and long-distance dependencies in particular (Lewis & Vasishth, 2005; Wagers, Lau, & Phillips, 2009; Wagers, 2008). When retrieval fails using CBMR (due to cue overlap, memory trace decay, or some other factor), comprehenders may compensate by applying a \u27good-enough\u27 processing heuristic, which prioritizes general comprehension over detailed syntactic computation. Prosody (or implicit prosody) may reduce processing load by either facilitating syntactic processing or otherwise assisting memory retrieval, thus reducing reliance on the good-enough fallback route. This investigation explores how text presentation format interacts with these algorithmic versus heuristic processing strategies. Most specifically, measuring whether the presentation format of text affects readers\u27 comprehension and ability to detect subject-verb agreement errors in simple and complex relative clause constructions.
The experimental design manipulated text presentation to influence implicit prosody, using sentences designed to induce subject-verb agreement attraction errors. Materials included simple and embedded relative clauses with head nouns and verbs that were either matched or mismatched for number. Participants read items in one of three presentation formats: a) whole sentence, b) word-by-word, or b) phrase-by-phrase, and rated each item for grammaticality and responded to a comprehension probe.
Results indicate that while overall comprehension is typically prioritized over grammatical processing (following the \u27Good-Enough\u27 Hypothesis), the effects of presentation format are differentially influential based on group differences and processing measure. For the L1 participants, facilitating the projection of phrasal prosody (phrase-by-phrase presentation) onto text enhances performance in syntactic and grammatical processing, while disrupting it via a word-by-word presentation decreases comprehension accuracy. For the L2 participants however, phrase-by-phrase presentation is not significantly beneficial for grammatical processing\u27even resulting in a decrease in comprehension accuracy. These differences provide insight into the interaction of cognitive taskload, processing strategy selection, and the role of implicit prosody in reading fluency, building toward a comprehensive processing model for speakers of varying language profiles and proficiencies
- âŠ