11,719 research outputs found
Topic and background knowledge effects on performance in speaking assessment
This study explores the extent to which topic and background knowledge of topic affect spoken
performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10 topics across three task types. Background knowledge and general language proficiency were measured using self-report questionnaires and C-tests respectively. Score data were analysed using many-facet Rasch measurement and multiple regression. Findings showed that for two of the three task types, the topics used in the study generally exhibited difficulty measures which were statistically distinct. However, the size of the differences in topic difficulties was too small to have a large practical effect on scores. Participants’ different levels of background knowledge were shown to have a systematic effect on performance. However, these statistically significant differences also failed to translate into practical significance. Findings hold implications for speaking performance assessment
Recommended from our members
Rhythm in the speech of a person with right hemisphere damage: Applying the pairwise variability index
Although several aspects of prosody have been studied in speakers with right hemisphere damage (RHD), rhythm remains largely uninvestigated. This study compares the rhythm of an Australian English speaker with right hemisphere damage (due to a stroke, but with no concomitant dysarthria) to that of a neurologically unimpaired individual. The speakers' rhythm is compared using the pairwise variability index (PVI) which allows for an acoustic characterization of rhythm by comparing the duration of successive vocalic and intervocalic intervals. A sample of speech from a structured interview between a speech and language therapist and each participant was analysed. Previous research has shown that speakers with RHD may have difficulties with intonation production, and therefore it was hypothesized that there may also be rhythmic disturbance. Results show that the neurologically normal control uses a similar rhythm to that reported for British English (there are no previous studies available for Australian English), whilst the speaker with RHD produces speech with a less strongly stress-timed rhythm. This finding was statistically significant for the intervocalic intervals measured (t(8) = 4.7, p < .01), and suggests that some aspects of prosody may be right lateralized for this speaker. The findings are discussed in relation to previous findings of dysprosody in RHD populations, and in relation to syllable-timed speech of people with other neurological conditions
Integration of a web-based rating system with an oral proficiency interview test: argument-based approach to validation
This dissertation focuses on the validation of the Oral Proficiency Interview (OPI), a component of the Oral English Certification Test for international teaching assistants. The rating of oral responses was implemented through an innovative computer technology—a web-based rating system called Rater-Platform (R-Plat). The main purpose of the dissertation was to investigate the validity of interpretations and uses of the OPI scores derived from raters’ assessment of examinees’ performance during the web-based rating process. Following the argument-based validation approach (Kane, 2006), an interpretive argument for the OPI was constructed. The interpretive argument specifies a series of inferences, warrants for each inference, as well as underlying assumptions and specific types of backing necessary to support the assumptions. Of seven inferences—domain description, evaluation, generalization, extrapolation, explanation, utilization, and impact—this study focuses on two. Specifically, it aims to obtain validity evidence for three assumptions underlying the evaluation inference and for three assumptions underlying the generalization inference. The research questions addressed: (1) raters’ perceptions towards R-Plat in terms of clarity, effectiveness, satisfaction, and comfort level; (2) quality of raters’ diagnostic descriptor markings; (3) quality of raters’ comments; (4) quality of OPI scores; (5) quality of individual raters’ OPI ratings; (6) prompt difficulty; and (7) raters’ rating practices.
A mixed-methods design was employed to collect and analyze qualitative and quantitative data. Qualitative data consisted of: (a) 14 raters’ responses to open-ended questions about their perceptions towards R-Plat, (b) 5 recordings of individual/focus group interviews on eliciting raters’ perceptions, and (c) 1,900 evaluative units extracted from raters’ comments about examinees’ speaking performance. Quantitative data included: (a) 14 raters’ responses to six-point scale statements about their perceptions, (b) 2,524 diagnostic descriptor markings of examinees’ speaking ability, (c) OPI scores for 279 examinees, (d) 803 individual raters’ ratings, (e) individual prompt ratings divided by each intended prompt level, given by each rater, and (f) individual raters’ ratings on the given prompts, grouped by test administration.
The results showed that the assumptions for the evaluation inference were supported. Raters’ responses to questionnaire and individual/focus group interviews revealed positive attitudes towards R-Plat. Diagnostic descriptors and raters’ comments, analyzed by chi-square tests, indicated different speaking ability levels. OPI scores were distributed across different proficiency levels throughout different test administrations. For the generalization inference, both positive and negative evidence was obtained. MFRM analyses showed that OPI scores reliably separated examinees into different speaking ability levels. Observed prompt difficulty matched intended prompt levels, although several problematic prompts were identified. Finally, while the raters used rating scales consistently adequately within the same test administration, they were not consistent in their severity. Overall, the foundational parts for the validity argument were successfully established.
The findings of this study allow for moving forward with the investigation of the subsequent inferences in order to construct a complete OPI validity argument. They also suggest important implications for argument-based validation research, for the study of raters and task variability, and for future applications of web-based rating systems for speaking assessment
A comparison of holistic, analytic, and part marking models in speaking assessment
This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test.
Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2
focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences.
Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign
scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important
differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the
scoring validity of speaking tests
Unstressed Vowels in German Learner English: An Instrumental Study
This study investigates the production of vowels in unstressed syllables by advanced German learners of English in comparison with native speakers of Standard Southern British English. Two acoustic properties were measured: duration and formant structure. The results indicate that duration of unstressed vowels is similar in the two groups, though there is some variation depending on the phonetic context. In terms of formant structure, learners produce slightly higher F1 and considerably lower F2, the difference in F2 being statistically significant for each learner. Formant values varied as a function of context and orthographic representation of the vowel
Children at risk : their phonemic awareness development in holistic instruction
Includes bibliographical references (p. 17-19
SPEAKING ENGLISH PERFORMANCE ASSESSMENT WITH THE FACET RASCH MEASUREMENT MODEL
This study aims to assess students' English-speaking abilities based on peer assessment. This study is a quantitative study involving 10 students. Data was collected using tests and student speaking assessment rubrics with score criteria from 1 to 5. Speaking assessment criteria are pronunciation, grammar, vocabulary, fluency and understanding. Data were analyzed using Many Faceted Rasch Measurement (MFRM). The Facets Rasch Measurement model is able to see the interaction between respondents and items at once. The research results show that the item index for criteria/quality (6.39), speaker (0.51), and rater (5.32) as well as the standard deviation value clearly shows a good distribution of item difficulty. Criterion reliability is 0.98 for raters is 0.21, for raters is 0.97
The development of automatic speech evaluation system for learners of English
制度:新 ; 報告番号:甲3183号 ; 学位の種類:博士(教育学) ; 授与年月日:2010/11/30 ; 早大学位記番号:新547
- …