2,417 research outputs found
An Efficient Probabilistic Deep Learning Model for the Oral Proficiency Assessment of Student Speech Recognition and Classification
Natural Language Processing is a branch of artificial intelligence (AI) that focuses on the interaction between computers and human language. Speech recognition systems utilize machine learning algorithms and statistical models to analyze acoustic features of speech, such as pitch, duration, and frequency, to convert spoken words into written text. The Student English Oral Proficiency Assessment and Feedback System provides students with a comprehensive evaluation of their spoken English skills and offers tailored feedback to help them improve. It can be used in language learning institutions, universities, or online platforms to support language education and enhance oral communication abilities. In this paper constructed a framework stated as Latent Dirichlet Integrated Deep Learning (LDiDL) for the assessment of student English proficiency assessment. The system begins by collecting a comprehensive dataset of spoken English samples, encompassing various proficiency levels. Relevant features are extracted from the samples, including acoustic characteristics and linguistic attributes. Leveraging Latent Dirichlet Allocation (LDA), the system uncovers latent topics within the data, enabling a deeper understanding of the underlying themes present in the spoken English. To further enhance the analysis, a deep learning model is developed, integrating the LDA topics with the extracted features. This model is trained using appropriate techniques and evaluated using performance metrics. Utilizing the predictions made by the model, the system generates personalized feedback for each student, focusing on areas of improvement such as vocabulary, grammar, fluency, and pronunciation. Simulation mode uses the native English speech audio for the LDiDL training and classification. The experimental analysis stated that the proposed LDiDL model achieves an accuracy of 99% for the assessment of English Proficiency
Mapping the Relationships among the Cognitive Complexity of Independent Writing Tasks, L2 Writing Quality, and Complexity, Accuracy and Fluency of L2 Writing
Drawing upon the writing literature and the task-based language teaching literature, the study examined two cognitive complexity dimensions of L2 writing tasks: rhetorical task varying in reasoning demand and topic familiarity varying in the amount of direct knowledge of topics. Four rhetorical tasks were studied: narrative, expository, expo-argumentative, and argumentative tasks. Three topic familiarity tasks were investigated: personal-familiar, impersonal-familiar, and impersonal-less familiar tasks. Specifically, the study looked into the effects of these two cognitive complexity dimensions on L2 writing quality scores, their effects on complexity, accuracy, and fluency (CAF) of L2 production, and the predictive power of the CAF features on L2 writing scores for each task. Three hundred and seventy five Chinese university EFL students participated in the study, and each student wrote on one of the six writing tasks used to study the cognitive complexity dimensions. The essays were rated by trained raters using a holistic scale. Thirteen CAF measures were used, and the measures were all automated through computer tools. One-way ANOVA tests revealed that neither rhetorical task nor topic familiarity had an effect on the L2 writing scores. One-way MANOVA tests showed that neither rhetorical task nor topic familiarity had an effect on accuracy and fluency of the L2 writing, but that the argumentative essays were significantly more complex in global syntactic complexity features than the essays on the other rhetorical tasks, and the essays on the less familiar topic were significantly less complex in lexical features than the essays on the more familiar topics. All-possible subsets regression analyses revealed that the CAF features explained approximately half of the variance in the writing scores across the tasks and that writing fluency was the most important CAF predictor for five tasks. Lexical sophistication was however the most important CAF predictor for the argumentative task. The regression analyses further showed that the best regression models for the narrative task were distinct from the ones for the expository and argumentative types of tasks, and the best models for the personal-familiar task were distinct from the ones for the impersonal tasks
Validation of Score Meaning for the Next Generation of Assessments
Despite developments in research and practice on using examinee response process data in assessment design, the use of such data in test validation is rare. Validation of Score Meaning in the Next Generation of Assessments Using Response Processes highlights the importance of validity evidence based on response processes and provides guidance to measurement researchers and practitioners in creating and using such evidence as a regular part of the assessment validation process. Response processes refer to approaches and behaviors of examinees when they interpret assessment situations and formulate and generate solutions as revealed through verbalizations, eye movements, response times, or computer clicks. Such response process data can provide information about the extent to which items and tasks engage examinees in the intended ways. With contributions from the top researchers in the field of assessment, this volume includes chapters that focus on methodological issues and on applications across multiple contexts of assessment interpretation and use. In Part I of this book, contributors discuss the framing of validity as an evidence-based argument for the interpretation of the meaning of test scores, the specifics of different methods of response process data collection and analysis, and the use of response process data relative to issues of validation as highlighted in the joint standards on testing. In Part II, chapter authors offer examples that illustrate the use of response process data in assessment validation. These cases are provided specifically to address issues related to the analysis and interpretation of performance on assessments of complex cognition, assessments designed to inform classroom learning and instruction, and assessments intended for students with varying cultural and linguistic backgrounds
The role of feedback in the processes and outcomes of academic writing in english as a foreign language at intermediate and advanced levels
Providing feedback on studentsâ texts is one of the essential components of teaching second language writing. However, whether and to what extent students benefit from feedback has been an issue of considerable debate in the literature. While many researchers have stressed its importance, others expressed doubts about its effectiveness. Regardless of these continuing and well-established debates, instructors consider feedback as a worthwhile pedagogical practice for second language learning. Based on this premise, I conducted three experimental studies to investigate the role of written feedback in Myanmar and Hungarian tertiary EFL classrooms. Additionally, I studied syntactic features and language-related error patterns in Hungarian and Myanmar studentsâ writing. This attempt was made to understand how students with different writing proficiency acted upon teacher and automated feedback.
The first study examined the efficacy of feedback on Myanmar studentsâ writing over a 13-week semester and how automated feedback provided by Grammarly could be integrated into writing instruction as an assistance tool for writing teachers. Results from pre-and post-tests demonstrated that studentsâ writing performance improved along the lines of four assessment criteria: task achievement, coherence and cohesion, grammatical range and accuracy, and lexical range and accuracy. Further results from a written feedback analysis revealed that the free version of Grammarly provided feedback on lower-level writing issues such as articles and prepositions, whereas teacher feedback covered both lower-and higher-level writing concerns. These findings suggested a potential for integrating automated feedback into writing instruction.
As limited attention was given to how feedback influences other aspects of writing development beyond accuracy, the second study examined how feedback influences the syntactic complexity of Myanmar studentsâ essays. Results from paired sample t-tests revealed no significant differences in the syntactic complexity of studentsâ writing when the comparison was made between initial and revised texts and between pre-and post-tests. These findings suggested that feedback on studentsâ writing does not lead them to write less structurally complex texts despite not resulting in syntactic complexity gains. The syntactic complexity of studentsâ revised texts varied among high-, mid-, and low-achieving students. These variations could be attributed to proficiency levels, writing prompts, genre differences, and feedback sources.
The rationale for conducting the third study was based on the theoretical orientation that differential success in learnersâ gaining from feedback largely depended on their engagement with the feedback rather than the feedback itself. Along these lines of research, I examined Hungarian studentsâ behavioural engagement (i.e., studentsâ uptake or revisions prompted by written feedback) with teacher and automated feedback in an EFL writing course. In addition to the engagement with form-focused feedback examined in the first study, I considered meaning-focused feedback, as feedback in a writing course typically covers both linguistic and rhetorical aspects of writing. The results showed differences in feedback focus (the teacher provided form-and meaning-focused feedback) with unexpected outcomes: studentsâ uptake of feedback resulted in moderate to low levels of engagement with feedback. Participants incorporated more form-focused feedback than meaning-focused feedback into their revisions. These findings contribute to our understanding of studentsâ engagement with writing tasks, levels of trust, and the possible impact of studentsâ language proficiency on their engagement with feedback.
Following the results that Myanmar and Hungarian students responded to feedback on their writing differently, I designed a follow-up study to compare syntactic features of their writing as indices of their English writing proficiency. In addition, I examined language-related errors in their texts to capture the differences in the error patterns in the two groups. Results from paired sample t-tests showed that most syntactic complexity indices distinguished the essays produced by the two groups: length of production units, sentence complexity, and subordination indices. Similarly, statistically significant differences were found in language-related error patterns in their texts: errors were more prevalent in Myanmar studentsâ essays. The implications for research and pedagogical practices in EFL writing classes are discussed with reference to the rationale for each study
Using Ontology-Based Approaches to Representing Speech Transcripts for Automated Speech Scoring
Text representation is a process of transforming text into some formats that computer systems can use for subsequent information-related tasks such as text classification. Representing text faces two main challenges: meaningfulness of representation and unknown terms. Research has shown evidence that these challenges can be resolved by using the rich semantics in ontologies. This study aims to address these challenges by using ontology-based representation and unknown term reasoning approaches in the context of content scoring of speech, which is a less explored area compared to some common ones such as categorizing text corpus (e.g. 20 newsgroups and Reuters).
From the perspective of language assessment, the increasing amount of language learners taking second language tests makes automatic scoring an attractive alternative to human scoring for delivering rapid and objective scores of written and spoken test responses. This study focuses on the speaking section of second language tests and investigates ontology-based approaches to speech scoring. Most previous automated speech scoring systems for spontaneous responses of test takers assess speech by primarily using acoustic features such as fluency and pronunciation, while text features are less involved and exploited. As content is an integral part of speech, the study is motivated by the lack of rich text features in speech scoring and is designed to examine the effects of different text features on scoring performance.
A central question to the study is how speech transcript content can be represented in an appropriate means for speech scoring. Previously used approaches from essay and speech scoring systems include bag-of-words and latent semantic analysis representations, which are adopted as baselines in this study; the experimental approaches are ontology-based, which can help improving meaningfulness of representation units and estimating importance of unknown terms. Two general domain ontologies, WordNet and Wikipedia, are used respectively for ontology-based representations. In addition to comparison between representation approaches, the author analyzes which parameter option leads to the best performance within a particular representation.
The experimental results show that on average, ontology-based representations slightly enhances speech scoring performance on all measurements when combined with the bag-of-words representation; reasoning of unknown terms can increase performance on one measurement (cos.w4) but decrease others. Due to the small data size, the significance test (t-test) shows that the enhancement of ontology-based representations is inconclusive.
The contributions of the study include: 1) it examines the effects of different representation approaches on speech scoring tasks; 2) it enhances the understanding of the mechanisms of representation approaches and their parameter options via in-depth analysis; 3) the representation methodology and framework can be applied to other tasks such as automatic essay scoring
Recommended from our members
Rater Cognition in L2 Speaking Assessment: A Review of the Literature
This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine how raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms of their focus and feature attention, their approaches to scoring, and their treatment of the scoring criteria and non-criteria relevant aspects and features of the speaking performance; 2) studies that explore why raters differ, through the analysis of the interactions between several rater background factors (i.e., rater language background, rater experience and rater training) and their rating behaviors and decision-making processes. The two types of studies have improved our understanding of the nature and the causes of rater variability in their perception and evaluation of L2 speech. However, very few of those studies has drawn on existing theories of human information processing and research on strategy use, which can explain on a cognitive-processing (Purpura, 2014) level what goes on in ratersâ mind during assessment. It is argued as a final conclusion that only based on established frameworks of human information processing and research on (meta)cognitive strategy use can rater cognition be explored with more depth and breadth
Automated Scoring of Speaking and Writing: Starting to Hit its Stride
This article reviews recent literature (2011âpresent) on the automated scoring (AS) of writing and speaking. Its purpose is to first survey the current research on automated scoring of language, then highlight how automated scoring impacts the present and future of assessment, teaching, and learning. The article begins by outlining the general background of AS issues in language assessment and testing. It then positions AS research with respect to technological advancements. Section two details the literature review search process and criteria for article inclusion. In section three, the three main themes emerging from the review are presented: automated scoring design considerations, the role of humans and artificial intelligence, and the accuracy of automated scoring with different groups. Two tables show how specific articles contributed to each of the themes. Following this, each of the three themes is presented in further detail, with a sequential focus on writing, speaking, and a short summary. Section four addresses AS implementation with respect to current assessment, teaching, and learning issues. Section five considers future research possibilities related to both the research and current uses of AS, with implications for the Canadian context in terms of the next steps for automated scoring
Assessing text readability and quality with language models
Automatic readability assessment is considered as a challenging task in NLP due to its high degree of subjectivity. The majority prior work in assessing readability has focused on identifying the level of education necessary for comprehension without the consideration of text quality, i.e., how naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim to use language models, trained on well-written prose, to measure not only text readability in terms of comprehension but text quality.
In this thesis, we developed two word-level metrics based on the concordance of article text with predictions made using language models to assess text readability and quality. We evaluate both metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by measuring the correlation between scores assigned by our metrics and human raters. According to the experimental results, our metrics are strongly correlated with text quality, which achieve 0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator for text quality evaluation
- âŠ