214 research outputs found

    Standard setting: Comparison of two methods

    Get PDF
    BACKGROUND: The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard – setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. METHODS: The norm – reference method of standard -setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. RESULTS: The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% – 87%). The modified Angoff method had an inter-rater reliability of 0.81 – 0.82 and a test-retest reliability of 0.59–0.74. CONCLUSION: There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability

    Changes in standard of candidates taking the MRCP(UK) Part 1 examination, 1985 to 2002: Analysis of marker questions

    Get PDF
    The maintenance of standards is a problem for postgraduate medical examinations, particularly if they use norm-referencing as the sole method of standard setting. In each of its diets, the MRCP(UK) Part 1 Examination includes a number of marker questions, which are unchanged from their use in a previous diet. This paper describes two complementary studies of marker questions for 52 diets of the MRCP(UK) Part 1 Examination over the years 1985 to 2001 to assess whether standards have changed

    The reporting of statistics in medical educational studies: an observational study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is confusion in the medical literature as to whether statistics should be reported in survey studies that query an entire population, as is often done in educational studies. Our objective was to determine how often statistical tests have been reported in such articles in two prominent journals that publish these types of studies.</p> <p>Methods</p> <p>For this observational study, we used electronic searching to identify all survey studies published in <it>Academic Medicine </it>and the <it>Journal of General Internal Medicine </it>in which an entire population was studied. We tallied whether inferential statistics were used and whether p-values were reported.</p> <p>Results</p> <p>Eighty-four articles were found: 62 in <it>Academic Medicine </it>and 22 in the <it>Journal of General Internal Medicine</it>. Overall, 38 (45%) of the articles reported or stated that they calculated statistics: 35% in <it>Academic Medicine </it>and 73% in the <it>Journal of General Internal Medicine</it>.</p> <p>Conclusion</p> <p>Educational enumeration surveys frequently report statistical tests. Until a better case can be made for doing so, a simple rule can be proffered to researchers. When studying an entire population (e.g., all program directors, all deans, and all medical schools) for factual information, do not perform statistical tests. Reporting percentages is sufficient and proper.</p

    In-training assessment using direct observation of single-patient encounters: a literature review

    Get PDF
    We reviewed the literature on instruments for work-based assessment in single clinical encounters, such as the mini-clinical evaluation exercise (mini-CEX), and examined differences between these instruments in characteristics and feasibility, reliability, validity and educational effect. A PubMed search of the literature published before 8 January 2009 yielded 39 articles dealing with 18 different assessment instruments. One researcher extracted data on the characteristics of the instruments and two researchers extracted data on feasibility, reliability, validity and educational effect. Instruments are predominantly formative. Feasibility is generally deemed good and assessor training occurs sparsely but is considered crucial for successful implementation. Acceptable reliability can be achieved with 10 encounters. The validity of many instruments is not investigated, but the validity of the mini-CEX and the ‘clinical evaluation exercise’ is supported by strong and significant correlations with other valid assessment instruments. The evidence from the few studies on educational effects is not very convincing. The reports on clinical assessment instruments for single work-based encounters are generally positive, but supporting evidence is sparse. Feasibility of instruments seems to be good and reliability requires a minimum of 10 encounters, but no clear conclusions emerge on other aspects. Studies on assessor and learner training and studies examining effects beyond ‘happiness data’ are badly needed

    The reliability of in-training assessment when performance improvement is taken into account

    Get PDF
    During in-training assessment students are frequently assessed over a longer period of time and therefore it can be expected that their performance will improve. We studied whether there really is a measurable performance improvement when students are assessed over an extended period of time and how this improvement affects the reliability of the overall judgement. In-training assessment results were obtained from 104 students on rotation at our university hospital or at one of the six affiliated hospitals. Generalisability theory was used in combination with multilevel analysis to obtain reliability coefficients and to estimate the number of assessments needed for reliable overall judgement, both including and excluding performance improvement. Students’ clinical performance ratings improved significantly from a mean of 7.6 at the start to a mean of 7.8 at the end of their clerkship. When taking performance improvement into account, reliability coefficients were higher. The number of assessments needed to achieve a reliability of 0.80 or higher decreased from 17 to 11. Therefore, when studying reliability of in-training assessment, performance improvement should be considered

    Competency-based evaluation tools for integrative medicine training in family medicine residency: a pilot study

    Get PDF
    BACKGROUND: As more integrative medicine educational content is integrated into conventional family medicine teaching, the need for effective evaluation strategies grows. Through the Integrative Family Medicine program, a six site pilot program of a four year residency training model combining integrative medicine and family medicine training, we have developed and tested a set of competency-based evaluation tools to assess residents' skills in integrative medicine history-taking and treatment planning. This paper presents the results from the implementation of direct observation and treatment plan evaluation tools, as well as the results of two Objective Structured Clinical Examinations (OSCEs) developed for the program. METHODS: The direct observation (DO) and treatment plan (TP) evaluation tools developed for the IFM program were implemented by faculty at each of the six sites during the PGY-4 year (n = 11 on DO and n = 8 on TP). The OSCE I was implemented first in 2005 (n = 6), revised and then implemented with a second class of IFM participants in 2006 (n = 7). OSCE II was implemented in fall 2005 with only one class of IFM participants (n = 6). Data from the initial implementation of these tools are described using descriptive statistics. RESULTS: Results from the implementation of these tools at the IFM sites suggest that we need more emphasis in our curriculum on incorporating spirituality into history-taking and treatment planning, and more training for IFM residents on effective assessment of readiness for change and strategies for delivering integrative medicine treatment recommendations. Focusing our OSCE assessment more narrowly on integrative medicine history-taking skills was much more effective in delineating strengths and weaknesses in our residents' performance than using the OSCE for both integrative and more basic communication competencies. CONCLUSION: As these tools are refined further they will be of value both in improving our teaching in the IFM program and as competency-based evaluation resources for the expanding number of family medicine residency programs incorporating integrative medicine into their curriculum. The next stages of work on these instruments will involve establishing inter-rater reliability and defining more clearly the specific behaviors which we believe establish competency in the integrative medicine skills defined for the program

    Relationship Between Peer Assessment During Medical School, Dean’s Letter Rankings, and Ratings by Internship Directors

    Get PDF
    BACKGROUND: It is not known to what extent the dean’s letter (medical student performance evaluation [MSPE]) reflects peer-assessed work habits (WH) skills and/or interpersonal attributes (IA) of students. OBJECTIVE: To compare peer ratings of WH and IA of second- and third-year medical students with later MSPE rankings and ratings by internship program directors. DESIGN AND PARTICIPANTS: Participants were 281 medical students from the classes of 2004, 2005, and 2006 at a private medical school in the northeastern United States, who had participated in peer assessment exercises in the second and third years of medical school. For students from the class of 2004, we also compared peer assessment data against later evaluations obtained from internship program directors. RESULTS: Peer-assessed WH were predictive of later MSPE groups in both the second (F = 44.90, P < .001) and third years (F = 29.54, P < .001) of medical school. Interpersonal attributes were not related to MSPE rankings in either year. MSPE rankings for a majority of students were predictable from peer-assessed WH scores. Internship directors’ ratings were significantly related to second- and third-year peer-assessed WH scores (r = .32 [P = .15] and r = .43 [P = .004]), respectively, but not to peer-assessed IA. CONCLUSIONS: Peer assessment of WH, as early as the second year of medical school, can predict later MSPE rankings and internship performance. Although peer-assessed IA can be measured reliably, they are unrelated to either outcome

    Instruments to measure the ability to self-reflect:A systematic review of evidence from workplace and educational settings including health care

    Get PDF
    Introduction: Self-reflection has become recognised as a core skill in dental education, although the ability to self-reflect is valued and measured within several professions. This review appraises the evidence for instruments available to measure the self-reflective ability of adults studying or working within any setting, not just health care. Materials and Methods: A systematic review was conducted of 20 electronic databases (including Medline, ERIC, CINAHL and Business Source Complete) from 1975 to 2017, supplemented by citation searches. Data were extracted from each study and the studies graded against quality indicators by at least two independent reviewers, using a coding sheet. Reviewers completed a utility analysis of the assessment instruments described within included studies, appraising their reported reliability, validity, educational impact, acceptability and cost. Results: A total of 131 studies met the inclusion criteria. Eighteen were judged to provide higher quality evidence for the review and three broad types of instrument were identified, namely: rubrics (or scoring guides), self-reported scales and observed behaviour. Conclusions: Three types of instrument were identified to assess the ability to self-reflect. It was not possible to recommend a single most effective instrument due to under reporting of the criteria necessary for a full utility analysis of each. The use of more than one instrument may therefore be appropriate dependent on the acceptability to the faculty, assessor, student and cost. Future research should report on the utility of assessment instruments and provide guidance on what constitutes thresholds of acceptable or unacceptable ability to self-reflect, and how this should be managed
    corecore