1,113 research outputs found

    Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder

    Full text link
    Dementia diagnosis requires a series of different testing methods, which is complex and time-consuming. Early detection of dementia is crucial as it can prevent further deterioration of the condition. This paper utilizes a speech recognition model to construct a dementia assessment system tailored for Mandarin speakers during the picture description task. By training an attention-based speech recognition model on voice data closely resembling real-world scenarios, we have significantly enhanced the model's recognition capabilities. Subsequently, we extracted the encoder from the speech recognition model and added a linear layer for dementia assessment. We collected Mandarin speech data from 99 subjects and acquired their clinical assessments from a local hospital. We achieved an accuracy of 92.04% in Alzheimer's disease detection and a mean absolute error of 9% in clinical dementia rating score prediction.Comment: submitted to IEEE ICASSP 202

    Issues in cross-cultural studies of advertising audiovisual material

    Get PDF
    This article presents an approach to cross-cultural studies of advertising audiovisual material that departs from the typical rigid marketing models. It favours a more qualitative inductive approach to corpuses, in which audiovisual texts are not approached or compared through the use of standardised American tools. After reviewing the usual marketing tools, the article focuses on the steps researchers can usefully take, from the gathering of audiovisual texts from two different environments to their classification, two important steps that are critical in such studies

    Reasons and Motivation of Islamic Scholar for Using Code-switching as Strategy in Delivering a Speech (Da'wah)

    Get PDF
    Code-switching is a challenging phenomenon to sociolinguists. It is related to the use of two or more languages in the same utterance or conversation in a context of bilingual or multilingual setting of conversation. In giving Islamic speech (Da'wah), m

    Predicting ESL learners’ oral proficiency by measuring the collocations in their spontaneous speech

    Get PDF
    Collocation, known as words that commonly co-occur, is a major category of formulaic language. There is now general consensus among language researchers that collocation is essential to effective language use in real-world communication situations (Ellis, 2008; Nesselhauf, 2005; Schmitt, 2010; Wray, 2002). Although a number of contemporary speech-processing theories assume the importance of formulaic language to spontaneous speaking (Bygate, 1987; de Bot, 1992; Kormos, 2006; Levelt, 1999), none of them gives an adequate explanation of the role that collocation plays in speech communication. In the practices of L2 speaking assessment, a test taker’s collocational performance is usually not separately scored mainly because human raters can only focus on a limited range of speech characteristics (Luoma, 2004). This paper argues for the centrality of collocation evaluation to communication-oriented L2 oral assessment. Based on a logical analysis of the conceptual connections among collocation, speech-processing theories, and rubrics for oral language assessment, the author formulated a new construct called Spoken Collocational Competence (SCC). In light of Skehan’s (1998, 2009) trade-off hypothesis, he developed a series of measures for SCC, namely Operational Collocational Performance Measures (OCPMs), to cover three dimensions of learner collocation performance in spontaneous speaking: collocation accuracy, collocation complexity, and collocation fluency. He then investigated the empirical performance of these measures with 2344 lexical collocations extracted from sixty adult English as a second language (ESL) learners’ oral assessment data collected in two distinctive contexts of language use: conversing with an interlocutor on daily-life topics (or the SPEAK exam) and giving an academic lecture (or the TEACH exam). Multiple regression and logistic regression were performed on criterion measures of these learners’ oral proficiency (i.e., human holistic scores and oral proficiency certification decisions) as a function of the OCPMs. The study found that the participants generally achieved higher collocation accuracy and complexity in the TEACH exam than in the SPEAK exam. In addition, the OCPMs as a whole predicted the participants’ oral proficiency certification status (certified or uncertified) with high accuracy (Negelkerke R2 = .968). However, the predictive power of OCPMs for human holistic scores seemed to be higher in the SPEAK exam (adjusted R2 = .678) than in the TEACH exam (adjusted R2 = .573). These findings suggest that L2 learners’ collocational performance in free speech deserve examiners’ closer attention and that SCC may contribute to the construct of oral proficiency somewhat differently across speaking contexts. Implications for L2 speaking theory, automated speech evaluation, and teaching and learning of oral communication skills are discussed

    What effect does short term Study Abroad (SA) have on learners’ vocabulary knowledge?

    Get PDF
    This thesis describes a study which tracks longitudinal changes in vocabularyknowledge during a short-term Study Abroad (SA) experience. A test ofproductive vocabulary knowledge, Lex30 (Meara & Fitzpatrick, 2000),requiring the production of word association responses, is used to elicit vocabulary from 38 Japanese L1 learners of English at four test times at equal intervals before and after an SA experience. The study starts by investigating whether there are changes in both the total number of words and in the number of less frequently occurring words produced by SA participants. Three additional ways of measuring the development of lexical knowledge over time are then proposed. The first examines changes in the ability of participants of different proficiency levels in producing collocates in response to Lex30 cue words. The second tracks changes in spelling accuracy to measure if improvements take place over time. The third analysis uses an online measuring instrument (Wmatrix; Rayson, 2009) to explore if there are any changes in the mastery of specific semantic domains. The results show that there is significant growth in the productive use of less frequent vocabulary knowledge during the SA period. There is also an increase in collocation production with lower proficiency participants and evidence of some improvement in the way certain vocabulary items are spelled. The tendency for SA learners to produce more words from semantic groups related to SA experiences is also demonstrated. Post-SA tests show that while some knowledge attrition occurs it does not decline to pre-SA levels. The studyshows how short-term SA programmes can be evaluated using a word association test, contributing to a better understanding of how vocabularydevelops during intensive language learning experiences. It also demonstrates the gradual shift of productive vocabulary knowledge from partial word knowledge to a more complete state of productive mastery

    Designing, implementing, and evaluating an automated writing evaluation tool for improving EFL graduate students’ abstract writing: a case in Taiwan

    Get PDF
    Writing English research article (RA) abstracts is a difficult but mandatory task for Taiwanese engineering graduate students (Feng, 2013). Understanding the current situation and needs of Taiwanese engineering graduate students, this dissertation aimed to develop and evaluate an automated writing evaluation (AWE) tool to assist their research article (RA) abstract writing in English by following a Design-Based Research (DBR) approach as the methodological framework. DBR was chosen because it strives to solve real-world problems through multiple iterations of development and building on results from each iteration to advance the project. Six design iterations were undertaken to develop and to evaluate the AWE tool in this dissertation, including (1) corpus compilation of engineering RAs, (2) genre analysis of engineering abstracts, (3) machine learning of move classification in abstracts, (4) analysis of lexical bundles used to express moves, (5) analysis of the choice of verb categories associated with moves, and finally, (6) AWE tool development based on previous findings, classroom implementation, and evaluation of the AWE tool following Chapelle’s (2001) computer-assisted language learning (CALL) framework. To begin with, I collected a corpus of 480 engineering RAs (Corpus-480) to extract appropriate linguistic properties as pedagogical materials to be implemented in the AWE tool. A sub-corpus (Corpus-72) was compiled with 72 RAs randomly chosen from Corpus-480 for manual and automated analyses. Next, to seek the best descriptive framework for the structure of engineering RA abstracts, two move schemata were compared: (1) IMRD (Introduction, Methodology, Results, and Discussion) and (2) CARS (Create-A-Research-Space, Swales, 1990). Abstracts in Corpus-72 were annotated and these two schemas were evaluated according to three quantitative metrics devised specifically for this comparison. Applying a statistical natural language processing (StatNLP) approach, a Support Vector Machine (SVM) was trained for automated move classification in abstracts. Formulaic language in engineering RA sections was used as linguistic features to automatically classify moves in abstracts. Additionally, four-word lexical bundles and verb categories were identified from Corpus-480 and Corpus-72, respectively. Four-word lexical bundles associated with moves in abstracts were extracted automatically. Additionally, verb categories (i.e., tense, aspect, and voice) in moves of abstracts were identified using CyWrite::Analyzer, a hybrid (statistical and rule-based) NLP software. Finally, the AWE tool was developed, based on the findings from the previous iterations, and implemented in an English-as-a-foreign-language (EFL) classroom setting. Through analyzing students’ drafts before and after using the tool, and responses to a questionnaire and a semi-structured interview, the AWE tool was evaluated based on Chapelle’s (2001) CALL evaluation framework. The findings showed that students attempted to improve their abstracts by adding, deleting, or changing the sequences of their sentences, lexical bundles, and verb categories in their abstracts. Their attitudes toward the effectiveness and appropriateness of the tool were quite positive. Overall, the AWE tool drew students’ attention to the use of lexical bundles and verb categories to achieve the communicative purposes of each move in their abstracts. In conclusion, this dissertation started from Taiwanese engineering students’ needs to improve their English abstract writing, and attempted to develop and evaluate an AWE tool for assisting them. Following DBR, the findings from this dissertation are discussed to improve the next generation of the AWE tools. Having these iterations in place, future studies can focus on developing pedagogical materials from genre-based analysis in different disciplines to fulfill learners’ needs

    Objective speech quality measurement for Chinese speech.

    Get PDF
    In the search for the optimisation of transmission speed and storage, speech information is often coded, or transmitted with a reduced bandwidth. As a result, quality and/or intelligibility are sometimes degraded. Speech quality is normally defined as the degree of goodness in the perception of speech while speech intelligibility is how well or clearly one can understand what is being said. In order to assess the level of acceptability of degraded speeches, various subjective methods have been developed to test codecs or sound processing systems. Although good results have been demonstrated with these, they are time consuming and expensive due to the necessary involvement of teams of professional or naive subjects1[56]. To reduce cost, computerised objective systems were created with the hope of replacing human subjects [90][43]. While reasonable standards have been reported by several of these systems, they have not reached the accuracy of well constructed subjective tests yet [92][84]. Therefore, their evaluations and improvements are constantly been researched for further breakthroughs. To date, objective speech quality measurement systems (OSQMs) have been developed mostly in Europe or the United States, and effectiveness is only tested for English, several European and Asian languages but not Chinese (Mandarin) [38][70][32]

    Integration of Phonotactic Features for Language Identification on Code-Switched Speech

    Get PDF
    Abstract: In this paper, phoneme sequences are used as language information to perform code-switched language identification (LID). With the one-pass recognition system, the spoken sounds are converted into phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based bigram language models (LM) are integrated into speech decoding to eliminate possible phone mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic information of mixed-language speech based on recognized phone sequences. As the back-end decision is taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to classify language identity. The speech corpus was tested on Sepedi and English languages that are often mixed. Our system is evaluated by measuring both the ASR performance and the LID performance separately. The systems have obtained a promising ASR accuracy with data-driven phone merging approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy

    EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

    Full text link
    Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies based on multiple combinations of EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. The late fusion strategy is deemed to be the most effective approach for simultaneous speech generation and enhancement.Comment: Accepted By IEEE Signal Processing Lette
    • 

    corecore