10 research outputs found

    Interpreting learning progress using assessment scores: What is there to gain?

    Get PDF
    Using assessment scores to quantify gains and growth trajectories for individuals and groups can provide a valuable lens on learning progress for all students. This paper summarises some commonly observed patterns of progress and illustrates these using data from ACER’s Progressive Achievement Test (PAT) assessments. While growth trajectory measurement requires scores for the same individuals over at least three but preferably more occasions, scores from only two occasions are naturally more readily available. The difference between two successive scores is usually referred to as gain. Some common approaches and pitfalls when interpreting individual student gain data are illustrated. It is concluded that pairs of consecutive scores are best considered as part of a longer-term trajectory of learning progress, and that caveated gain information might at best play a peripheral role until additional scores are available for individuals. This review is part of a larger program of research to inform future reporting developments at ACER

    Building capacity for Quality Teaching Rounds – Victoria. Final report

    Get PDF
    The Australian Council for Educational Research (ACER) was commissioned by the Teachers and Teaching Research Centre (TTRC) at the University of Newcastle to conduct an independent randomised controlled trial (RCT), with the goal of examining effects of Quality Teaching Rounds (QTR) on student outcomes and teachers’ practice in Victorian high schools. A total of 19 schools participated in Quality Teaching Rounds in 2022, with 20 schools in the wait list control. Data were gathered in an ongoing manner during the evaluation with: Progressive Assessment Tests in Mathematics (PAT-M) and reading (PAT-R) – baseline and follow up; student self-efficacy and aspiration surveys – baseline and follow up; teacher surveys – one questionnaire administered every term; implementation fidelity check surveys for teachers to complete for each QT Round; and implementation fidelity checks with onsite visits from ACER staff for 33% of the treatment schools. Key findings include: The mixed model analysis showed that treatment was not a significant predictor of PAT-R and PAT-M outcomes. Differences in student responses to the self-efficacy and aspiration surveys were identified. The control group showed a significant increase in the level of education that they aspired to complete (p = 0.037). Teachers in the control group had statistically significant growth in teacher efficacy, while those in the treatment group showed statistically significant lower teacher student support. Within the QTR process, the longest time was spent on discussing the coding and the individual coding process. Key observations identified from analysis of the fidelity check data are: teacher stress due to high rates of absenteeism, varied use of the Classroom Practice Guide, and analytical conversations about some elements and terms

    Rasch scaling procedures for informing development of a valid Fetal Surveillance Education Program multiple-choice assessment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is widely recognised that deficiencies in fetal surveillance practice continue to contribute significantly to the burden of adverse outcomes. This has prompted the development of evidence-based clinical practice guidelines by the Royal Australian and New Zealand College of Obstetricians and Gynaecologists and an associated Fetal Surveillance Education Program to deliver the associated learning. This article describes initial steps in the validation of a corresponding multiple-choice assessment of the relevant educational outcomes through a combination of item response modelling and expert judgement.</p> <p>Methods</p> <p>The Rasch item response model was employed for item and test analysis and to empirically derive the substantive interpretation of the assessment variable. This interpretation was then compared to the hierarchy of competencies specified a priori by a team of eight subject-matter experts. Classical Test Theory analyses were also conducted.</p> <p>Results</p> <p>A high level of agreement between the hypothesised and derived variable provided evidence of construct validity. Item and test indices from Rasch analysis and Classical Test Theory analysis suggested that the current test form was of moderate quality. However, the analyses made clear the required steps for establishing a valid assessment of sufficient psychometric quality. These steps included: increasing the number of items from 40 to 50 in the first instance, reviewing ineffective items, targeting new items to specific content and difficulty gaps, and formalising the assessment blueprint in light of empirical information relating item structure to item difficulty.</p> <p>Conclusion</p> <p>The application of the Rasch model for criterion-referenced assessment validation with an expert stakeholder group is herein described. Recommendations for subsequent item and test construction are also outlined in this article.</p

    Applications of item response theory to identify and correct for suspect rater data

    No full text
    Thesis (M.A.E.)--University of Melbourne, Faculty of Education, 2006This thesis describes a plausible values imputation approach for deriving population estimates on several language proficiency domains. The approach harnessed a multi-dimensional item response analysis combining student responses, rater judgements and student background variables. The target student population was lower grade primary school students enrolled in the Hong Kong schooling system. The raters consisted of local teachers of English employed within the sampled target schools. The primary objective of this research was to impute plausible values where no data was provided or where rater data was deemed suspect. By necessity, a secondary objective of this study was to establish rules for justly excluding particular data on the basis of questionable validity. Surveys such as TIMSS, PISA and NAEP have used such "plausible value" methodologies to account for incomplete test designs and person non-response (Beaton & Johnson, 1990; Yamamoto & Kulick, 2000; Adams & Wu, 2002). The point of difference between this study and other similar studies was the use of item response theory (in particular plausible values imputation) to identify and correct for invalid rater judgements in a large-scale educational survey. An additional research outcome included a derived index of rater data quality based upon imputation scores

    Assessment of student problem-solving processes with interactive computer-based tasks

    No full text
    © 2009 Dr. Nathan Paul ZoanettiProblem solving is recognised as an important intellectual activity in schooling and beyond. In particular, generic problem-solving skills which transfer across learning areas are valued educational outcomes. The objective of this study was the design and evaluation of an online assessment system that provided diagnostic information on students' development of problem-solving competencies at upper primary and lower secondary school level. This resulted in the development of a methodology for collecting and interpreting problem-solving process data to assess important procedural aspects of problem solving. In this research study, existing assessment design and analysis methodologies were extended and applied to produce descriptions of problem-solving behaviour useful for both students and educators. The assessment system utilised recent advances in technology, assessment design and analysis, and problem-solving theory to guide the development of interactive computer-based tasks and to facilitate the interpretation of complex process data from student solution processes. Rules for interpreting computer-captured process data were empirically validated using qualitative verbal protocol analysis techniques. This study introduced a novel contribution to assessment design methodology called a temporal evidence map. This data transcription tool was designed for displaying and analysing concurrent sources of process data collected throughout task piloting exercises. Use of this tool culminated in the refinement of tasks and scoring rules, and informed development of additional tasks for the main data collection phase of the study. Following large-scale online data collection, the data were probabilistically modelled using Bayesian Inference Networks. A range of model evaluations were carried out to gauge aspects of assessment validity and reliability. Finally, the inferences generated via Bayesian modelling were used to produce diagnostic student profile reports suitable for informing instruction. Educators have much to gain from technology-based assessment systems underpinned by cognitively diagnostic models of cognition. In particular, supporting assessment inferences about procedural quality is well-aligned with 21st century skills in information-rich educational and vocational settings. This study provides diagnostic information to educators about how, and not just if, students solve problems

    Supporting judgements with statistical modelling

    No full text
    Medical colleges around the world are embracing holistic approaches to assessment that focus progression decisions on overall performance against different domains or proficiencies, rather than performance on high-stakes examinations alone

    Immediate and longer-term impacts of fetal surveillance education on workforce knowledge and cognitive skills [version 1; peer review: 2 approved]

    No full text
    Background: Following the development of the Royal Australian College of Obstetricians and Gynaecologists Intrapartum Fetal Surveillance Guideline in 2003, an education program was developed to support guideline implementation and clinical practice. It was intended that improved clinician knowledge, particularly of cardiotocography, would reduce rates of intrapartum fetal morbidity and mortality. The program contains a multiple-choice assessment, designed to assess fetal surveillance knowledge and the application of that knowledge. We used the results of this assessment over time to evaluate the impact of the education program on clinicians’ fetal surveillance knowledge and interpretive skills, in the immediate and longer-term. Methods: We undertook a retrospective analysis of the assessment results for all participants in the Fetal Surveillance Education Program, between 2004 and 2018. Classical Test Theory and Rasch Item Response Theory analysis were used to evaluate the statistical reliability and quality of the assessment, and the measurement invariance or stability of the assessments over time. Clinicians’ assessment scores were then reviewed by craft group and previous exposure to the program. Results: The results from 64,430, broadly similar assessments, showed that participation in the education program was associated with an immediate improvement in clinician performance in the assessment. Performance improvement was sustained for up to 18 months following participation in the program and recurrent participation was associated with progressive improvements. These trends were observed for all craft groups (consultant obstetricians, doctors in training, general practitioners, midwives, student midwives). Conclusions: These findings suggest that the Fetal Surveillance Education Program has improved clinician knowledge and the associated cognitive skills over time. The stable difficulty of the assessment tool means any improvement in clinician’s results, with ongoing exposure to the program, can be reliably assessed and demonstrated. Importantly this holds true for all craft groups involved in intrapartum care and the interpretation of cardiotocography
    corecore