52 research outputs found

    Changes in standard of candidates taking the MRCP(UK) Part 1 examination, 1985 to 2002: Analysis of marker questions

    Get PDF
    The maintenance of standards is a problem for postgraduate medical examinations, particularly if they use norm-referencing as the sole method of standard setting. In each of its diets, the MRCP(UK) Part 1 Examination includes a number of marker questions, which are unchanged from their use in a previous diet. This paper describes two complementary studies of marker questions for 52 diets of the MRCP(UK) Part 1 Examination over the years 1985 to 2001 to assess whether standards have changed

    A collaborative comparison of Objective Structured Clinical Examination (OSCE) standard setting methods at Australian medical schools

    Get PDF
    Background: A key issue underpinning the usefulness of the OSCE assessment to medical education is standard-setting, but the majority of standard-setting methods remain challenging for performance assessment because they produce varying passing marks. Several studies have compared standard setting methods; however, most of these studies are limited by their experimental scope, or use data on examinee performance at a single OSCE station or from a single medical school. This collaborative study between ten Australian medical schools investigated the effect of standard-setting methods on OSCE cut scores and failure rates. Methods: This research used 5,256 examinee scores from seven shared OSCE stations to calculate cut scores and failure rates using two different compromise standard-setting methods, namely the Borderline Regression and Cohen's methods. Results: The results of this study indicate that Cohen's method yields similar outcomes to the Borderline Regression method, particularly for large examinee cohort sizes. However, with lower examinee numbers on a station, the Borderline Regression method resulted in higher cut scores and larger difference margins in the failure rates. Conclusion: Cohen's method yields similar outcomes as the Borderline Regression method and its application for benchmarking purposes and in resource-limited settings is justifiable, particularly with large examinee numbers

    Modifying Hofstee standard setting for assessments that vary in difficulty, and to determine boundaries for different levels of achievement.

    Get PDF
    BACKGROUND: Fixed mark grade boundaries for non-linear assessment scales fail to account for variations in assessment difficulty. Where assessment difficulty varies more than ability of successive cohorts or the quality of the teaching, anchoring grade boundaries to median cohort performance should provide an effective method for setting standards. METHODS: This study investigated the use of a modified Hofstee (MH) method for setting unsatisfactory/satisfactory and satisfactory/excellent grade boundaries for multiple choice question-style assessments, adjusted using the cohort median to obviate the effect of subjective judgements and provision of grade quotas. RESULTS: Outcomes for the MH method were compared with formula scoring/correction for guessing (FS/CFG) for 11 assessments, indicating that there were no significant differences between MH and FS/CFG in either the effective unsatisfactory/satisfactory grade boundary or the proportion of unsatisfactory graded candidates (p > 0.05). However the boundary for excellent performance was significantly higher for MH (p < 0.01), and the proportion of candidates returned as excellent was significantly lower (p < 0.01). MH also generated performance profiles and pass marks that were not significantly different from those given by the Ebel method of criterion-referenced standard setting. CONCLUSIONS: This supports MH as an objective model for calculating variable grade boundaries, adjusted for test difficulty. Furthermore, it easily creates boundaries for unsatisfactory/satisfactory and satisfactory/excellent performance that are protected against grade inflation. It could be implemented as a stand-alone method of standard setting, or as part of the post-examination analysis of results for assessments for which pre-examination criterion-referenced standard setting is employed

    Limitations of methodological experiments

    No full text
    The extent to which empirical results can lead to methodological conclusions is investigated. No specific limitations are found to be involved in drawing conclusions on the plausibility of an artifact (Campbell). Two other types of ‘meta-research’ do appear to be problematic: (a) empirically based inferences on the kind of roles adopted by subjects with respect to the experiment (Weber and Cook) are tenuous, since the role-playing may well enter into the meta-research itself; (b) a similar intricacy arises with research on experimenter expectancy effects (Rosenthal). Possible ways of correcting for artifacts in meta-research, and the potential threat to scientific discourse that is associated with these corrections, are discussed

    The relation between category breadth and social desirability:A contest between two explanations

    No full text
    Hampson, Goldberg and John (1987) reported a positive correlation between category breadth and social desirability of trait descriptive adjectives. Two possible explanations for this finding are as follows. (a) Undesirable traits represent denials of desirable traits, and are thus more difficult to process cognitively; therefore, fewer instances of negative traits can be imagined. (b) Undesirable behaviours are less frequent; therefore, fewer instances spring to mind. With respect to root/negation pairs of traits in which the negation is socially desirable (e.g. Unenvious/Envious), Hypothesis (a) predicts a lower category breadth for the negation, whereas Hypothesis (b) predicts the reverse. Using the relevant trait pairs in Table 1 from Hampson et al. (1987), Hypothesis (b) appeared to be victorious in 10 of the 12 cases (p <0.05)

    Who should own the definition of personality?

    No full text
    The averaged judgment of knowledgeable others provides the best available point of reference both for the definition of personality structure in general and for assessing someone's personality in particular. Self-judgments, as in personality questionnaires, are intrinsically deficient because judgment errors cannot be averaged out. The recommended procedure for assessing someone's personality is to give a personality questionnaire, phrased in the third person singular, to those who know the target best. This set may or may not include the target person as a judge

    METHODOLOGICAL DECISION RULES AS RESEARCH POLICIES - A BETTING RECONSTRUCTION OF EMPIRICAL-RESEARCH

    No full text
    A betting model of empirical research is described. The model requires that opposing parties reach agreement on an operationalization and specify their predictions in terms of a probability distribution over possible research outcomes. Proper decision rules are used to decide on the amounts of reputation that are gained and lost upon observing the data. It is argued that adoption of the model would lead to less trivial research, less selective publication, and a more liberal attitude towards experimental design. The betting model is contrasted with several other methodol-ogies. In these comparisons, methodological models are viewed as policies, i.e., sets of rules which lead to certain predictable consequences if rational individuals exploit these rules to their own advantage. All comparisons reveal that the other models have relatively undesirable properties from this point of view. In discussing criticisms of the betting model, it appears that the model is also not completely water-proof from a policy point of view when participants do not wish to maximize their subjectively expected reputation. Other criticisms are discussed and are found wanting. Bets enter quite naturally into scientific discussions. Some years ago, an American colleague and I were discussing handwriting analysis. We were in general agreement that the usefulness of handwritings for making predictions about persons is very limited at best. As a devil’s advocate, however, I asserted that I could “predict ” a person’s national-ity from his or her handwriting. My colleague, who probably suspected me of entertaining ideas about national character, disagreed strongly with my assertion, and we decided to bet upon it. Those who have corresponded in handwriting with foreigners will not be surprised that I won this bet

    The use of everyday personality language for scientific purposes

    No full text
    The major question of the article is whether the natural language of personality provides an adequate point of departure for the construction of a scientific system of personological categories. Five obstacles to this endeavour are: (1) the domain is dificult to delineate, both with respect to its categories and in the choosing of items within categories; (2) the extent to which terms can be translated from one language to another appears to be limited; (3) the overwhelming role of evaluative aspects is embarrassing from a scientific point of view; (4) instead of obeying simple and clear taxonomic principles, the domain appears to be unruly in this respect; and (5) many terms and expressions are paradoxical when used in the first person. Tentative and partial solutions to these problems are proposed
    corecore