1,442 research outputs found
Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education
BACKGROUND:
As assessment has been shown to direct learning, it is critical that the examinations developed to test clinical competence in medical undergraduates are valid and reliable. The use of extended matching questions (EMQ) has been advocated to overcome some of the criticisms of using multiple-choice questions to test factual and applied knowledge.
METHODS:
We analysed the results from the Extended Matching Questions Examination taken by 4th year undergraduate medical students in the academic year 2001 to 2002. Rasch analysis was used to examine whether the set of questions used in the examination mapped on to a unidimensional scale, the degree of difficulty of questions within and between the various medical and surgical specialties and the pattern of responses within individual questions to assess the impact of the distractor options.
RESULTS:
Analysis of a subset of items and of the full examination demonstrated internal construct validity and the absence of bias on the majority of questions. Three main patterns of response selection were identified.
CONCLUSION:
Modern psychometric methods based upon the work of Rasch provide a useful approach to the calibration and analysis of EMQ undergraduate medical assessments. The approach allows for a formal test of the unidimensionality of the questions and thus the validity of the summed score. Given the metric calibration which follows fit to the model, it also allows for the establishment of items banks to facilitate continuity and equity in exam standards
An assessment of validity and responsiveness of generic measures of health-related quality of life in hearing impairment
This article is made available through the Brunel Open Access Publishing Fund. This article is distributed under the terms of the
Creative Commons Attribution License which permits any use, distribution,
and reproduction in any medium, provided the original
author(s) and the source are credited.Purpose: This review examines psychometric performance of three widely used generic preference-based measures, that is, EuroQol 5 dimensions (EQ-5D), Health Utility Index 3 (HUI3) and Short-form 6 dimensions (SF-6D) in patients with hearing impairments.
Methods: A systematic search was undertaken to identify studies of patients with hearing impairments where health state utility values were measured and reported. Data were extracted and analysed to assess the reliability, validity (known group differences and convergent validity) and responsiveness of the measures across hearing impairments.
Results: Fourteen studies (18 papers) were included in the review. HUI3 was the most commonly used utility measures in hearing impairment. In all six studies, the HUI3 detected difference between groups defined by the severity of impairment, and four out of five studies detected statistically significant changes as a result of intervention. The only study available suggested that EQ-5D only had weak ability to discriminate difference between severity groups, and in four out of five studies, EQ-5D failed to detected changes. Only one study involved the SF-6D; thus, the information is too limited to conclude on its performance. Also evidence for the reliability of these measures was not found.
Conclusion: Overall, the validity and responsiveness of the HUI3 in hearing impairment was good. The responsiveness of EQ-5D was relatively poor and weak validity was suggested by limited evidence. The evidence on SF-6D was too limited to make any judgment. More head-to-head comparisons of these and other preference measures of health are required.Medical Research Counci
Understanding pregnancy planning in a low-income country setting: validation of the London measure of unplanned pregnancy in Malawi
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: The London Measure of Unplanned Pregnancy (LMUP) is a new and psychometrically valid measure of pregnancy intention that was developed in the United Kingdom. An improved understanding of pregnancy intention in low-income countries, where unintended pregnancies are common and maternal and neonatal deaths are high, is necessary to inform policies to address the unmet need for family planning. To this end this research aimed to validate the LMUP for use in the Chichewa language in Malawi.Methods: Three Chichewa speakers translated the LMUP and one translation was agreed which was back-translated and pre-tested on five pregnant women using cognitive interviews. The measure was field tested with pregnant women who were recruited at antenatal clinics and data were analysed using classical test theory and hypothesis testing.Results: 125 women aged 15-43 (median 23), with parities of 1-8 (median 2) completed the Chichewa LMUP. There were no missing data. The full range of LMUP scores was captured. In terms of reliability, the scale was internally consistent (Cronbach's alpha = 0.78) and test-retest data from 70 women showed good stability (weighted Kappa 0.80). In terms of validity, hypothesis testing confirmed that unmarried women (p = 0.003), women who had four or more children alive (p = 0.0051) and women who were below 20 or over 29 (p = 0.0115) were all more likely to have unintended pregnancies. Principal component analysis showed that five of the six items loaded onto one factor, with a further item borderline. A sensitivity analysis to assess the effect of the removal of the weakest item of the scale showed slightly improved performance but as the LMUP was not significantly adversely affected by its inclusion we recommend retaining the six-item score.Conclusion: The Chichewa LMUP is a valid and reliable measure of pregnancy intention in Malawi and can now be used in research and/or surveillance. This is the first validation of this tool in a low-income country, helping to demonstrate that the concept of pregnancy planning is applicable in such a setting. Use of the Chichewa LMUP can enhance our understanding of pregnancy intention in Malawi, giving insight into the family planning services that are required to better meet women's needs and save lives. © 2013 Hall et al.; licensee BioMed Central Ltd.Dr Hall’s Wellcome Trust Research Training Fellowship, grant number 097268/Z/11/Z
Testing and comparing two self-care-related instruments among older Chinese adults
Objectives The study aimed to test and compare the reliability and validity, including sensitivity and specificity of the two self-care-related instruments, the Self-care Ability Scale for the Elderly (SASE), and the Appraisal of Self-care Agency Scale-Revised (ASAS-R), among older adults in the Chinese context. Methods A cross-sectional design was used to conduct this study. The sample consisted of 1152 older adults. Data were collected by a questionnaire including the Chinese version of SASE (SASE-CHI), the Chinese version of ASAS-R (ASAS-R-CHI) and the Exercise of Self-Care Agency scale (ESCA). Homogeneity and stability, content, construct and concurrent validity, and sensitivity and specificity were assessed. Results The Cronbach's alpha (α) of SASE-CHI was 0.89, the item-to-total correlations ranged from r = 0.15 to r = 0.81, and the test-retest correlation coefficient (intra-class correlation coefficient, ICC) was 0.99 (95% CI, 0.99±1.00; P<0.001). The Cronbach's α of ASAS-R-CHI was 0.78, the item-to-total correlations ranged from r = 0.20 to r = 0.65, and the test-retest ICC was 0.95 (95% CI, 0.92±0.96; P<0.001). The content validity index (CVI) of SASE-CHI and ASAS-R-CHI was 0.96 and 0.97, respectively. The findings of exploratory and confirmatory factor analyses (EFA and CFA) confirmed a good construct validity of SASE-CHI and ASAS-R-CHI. The Pearson's rank correlation coefficients, as a measure of concurrent validity, between total score of SASE-CHI and ESCA and ASAS-R-CHI and ESCA were assessed to 0.65 (P<0.001) and 0.62 (P<0.001), respectively. Regarding ESCA as the criterion, the area under the receiver operator characteristic (ROC) curve for the cut-point of SASE-CHI and ASAS-R-CHI were 0.93 (95% CI, 0.91±0.94) and 0.83 (95% CI, 0.80±0.86), respectively. Conclusion There is no significant difference between the two instruments. Each has its own characteristics, but SASE-CHI is more suitable for older adults. The key point is that the users can choose the most appropriate scale according to the specific situation.publishedVersionNivå
Do self-reported intentions predict clinicians behaviour: a systematic review.
Background: Implementation research is the scientific study of methods to promote the systematic uptake of
clinical research findings into routine clinical practice. Several interventions have been shown to be effective in
changing health care professionals' behaviour, but heterogeneity within interventions, targeted behaviours, and
study settings make generalisation difficult. Therefore, it is necessary to identify the 'active ingredients' in
professional behaviour change strategies. Theories of human behaviour that feature an individual's "intention" to
do something as the most immediate predictor of their behaviour have proved to be useful in non-clinical
populations. As clinical practice is a form of human behaviour such theories may offer a basis for developing a
scientific rationale for the choice of intervention to use in the implementation of new practice. The aim of this
review was to explore the relationship between intention and behaviour in clinicians and how this compares to
the intention-behaviour relationship in studies of non-clinicians.
Methods: We searched: PsycINFO, MEDLINE, EMBASE, CINAHL, Cochrane Central Register of Controlled
Trials, Science/Social science citation index, Current contents (social & behavioural med/clinical med), ISI
conference proceedings, and Index to Theses. The reference lists of all included papers were checked manually.
Studies were eligible for inclusion if they had: examined a clinical behaviour within a clinical context, included
measures of both intention and behaviour, measured behaviour after intention, and explored this relationship
quantitatively. All titles and abstracts retrieved by electronic searching were screened independently by two
reviewers, with disagreements resolved by discussion.
Discussion: Ten studies were found that examined the relationship between intention and clinical behaviours in
1623 health professionals. The proportion of variance in behaviour explained by intention was of a similar
magnitude to that found in the literature relating to non-health professionals. This was more consistently the case
for studies in which intention-behaviour correspondence was good and behaviour was self-reported. Though firm
conclusions are limited by a smaller literature, our findings are consistent with that of the non-health professional
literature. This review, viewed in the context of the larger populations of studies, provides encouragement for
the contention that there is a predictable relationship between the intentions of a health professional and their
subsequent behaviour. However, there remain significant methodological challenges
The reliability of two visual motor integration tests used with children
Occupational therapists often assess the visual motor integration (VMI) skills of children and young people. It is important that therapists use tools with strong psychometric properties. This study aims to examine the reliability of 2 VMI tests. Ninety-two children between the ages of 5 and 17 years (response rate of 31%) completed 2 VMI tests: the Developmental Test of Visual Motor Integration (DTVMI) and the Full Range Test of Visual Motor Integration (FRTVMI). Cronbach\u27s alpha coefficient was used to examine the internal consistency of the 2 VMI tests whereas Spearman\u27s rho correlation was used to evaluate the test–retest reliability, intrarater reliability, and interrater reliability of the 2 VMI tests. The Cronbach\u27s alpha coefficient for the DTVMI was .82 and .72 for the FRTVMI. The test–retest reliability coefficient was .73 (p = .000) for the DTVMI and .49 (p = .05) for the FRTVMI. The interrater correlation was significant for both the DTVMI at .94 (p = .000) and FRTVMI at .68 (p = .001). The DTVMI intrarater reliability correlation result was .90 (p = .000) and the FRTVMI at .85 (p = .000). Overall, the DTVMI exhibited a higher level of reliability than the FRTVMI. Both VMI tests appear to exhibit reasonable levels of reliability and are recommended for use with children and young people.<br /
Recommended from our members
Psychometric properties of discourse measures in aphasia: acceptability, reliability, and validity
BACKGROUND: Discourse in adults with aphasia is increasingly the focus of assessment and therapy research. A broad range of measures is available to describe discourse, but very limited information is available on their psychometric properties. As a result, the quality of these measures is unknown, and there is very little evidence to motivate the choice of one measure over another. AIMS: To explore the quality of a range of discourse measures, targeting sentence structure, coherence, story structure and cohesion. Quality was evaluated in terms of the psychometric properties of acceptability (data completeness and skewness), reliability (inter- and intra-rater), and validity (content, convergent, discriminant and known groups). METHODS & PROCEDURES: Participants with chronic mild-to-moderate aphasia were recruited from community groups. They produced a range of discourses which were grouped into Cinderella and everyday discourses. Discourses were then transcribed orthographically and analyzed using macro- and microlinguistic measures (Story Grammar, Topic Coherence, Local Coherence, Reference Chains and Predicate Argument Structure-PAS). Data were evaluated against standard predetermined criteria to ascertain the psychometric quality of the measures. OUTCOMES & RESULTS: A total of 17 participants took part in the study. All measures had high levels of acceptability, inter- and intra-rater reliability, and had good content validity, as they could be related to a level of the theoretical model of discourse production. For convergent validity, as expected, 8/10 measures correlated with the Western Aphasia Battery-Revised (WAB-R) spontaneous speech scores, and 7/10 measures correlated with the Kissing and Dancing Test (KDT) scores (r ≥ 0.3), giving an overall positive rating for construct validity. For discriminant validity, as predicted, all measures had low correlations with Raven's Coloured Progressive Matrices (RCPM) and WAB-R Auditory Verbal Comprehension scores (r < 0.3), giving an overall positive rating for construct validity. Finally, for known groups validity, all measures indicated a difference between speakers with mild and moderate aphasia except for the Local Coherence measures. Overall, Story Grammar, Topic Coherence, Reference Chains and PAS emerged as the strongest measures in the current study because they achieved the predetermined thresholds for quality in terms of each of the psychometric parameters profiled. CONCLUSIONS & IMPLICATIONS: The current study is the first to psychometrically profile measures of discourse in aphasia. It contributes to the field by identifying Story Grammar, Topic Coherence, Reference Chains and PAS as the most psychometrically robust discourse measures yet profiled with speakers with aphasia. Until further data are available indicating the strength of other discourse measures, caution should be applied when using them
Young women's use of a microbicide surrogate: The complex influence of relationship characteristics and perceived male partners' evaluations
This is the post-print version of the article. The official published version can be found at the link below.Currently in clinical trials, vaginal microbicides are proposed as a female-initiated method of sexually transmitted infection prevention. Much of microbicide acceptability research has been conducted outside of the United States and frequently without consideration of the social interaction between sex partners, ignoring the complex gender and power structures often inherent in young women’s (heterosexual) relationships. Accordingly, the purpose of this study was to build on existing microbicide research by exploring the role of male partners and relationship characteristics on young women’s use of a microbicide surrogate, an inert vaginal moisturizer (VM), in a large city in the United States. Individual semi-structured interviews were conducted with 40 young women (18–23 years old; 85% African American; 47.5% mothers) following use of the VM during coital events for a 4 week period. Overall, the results indicated that relationship dynamics and perceptions of male partners influenced VM evaluation. These two factors suggest that relationship context will need to be considered in the promotion of vaginal microbicides. The findings offer insights into how future acceptability and use of microbicides will be influenced by gendered power dynamics. The results also underscore the importance of incorporating men into microbicide promotion efforts while encouraging a dialogue that focuses attention on power inequities that can exist in heterosexual relationships. Detailed understanding of these issues is essential for successful microbicide acceptability, social marketing, education, and use.This study was funded by a grant from National Institutes of Health (NIHU19AI 31494) as well as research awards to the first author: Friends of the Kinsey Institute Research Grant Award, Indiana University’s School of HPER Graduate Student Grant-in-Aid of Research Award, William L. Yarber Sexual Health Fellowship, and the Indiana University Graduate and Professional Student Organization Research Grant
Construct-level predictive validity of educational attainment and intellectual aptitude tests in medical student selection: meta-regression of six UK longitudinal studies
Background: Measures used for medical student selection should predict future performance during training. A problem for any selection study is that predictor-outcome correlations are known only in those who have been selected, whereas selectors need to know how measures would predict in the entire pool of applicants. That problem of interpretation can be solved by calculating construct-level predictive validity, an estimate of true predictor-outcome correlation across the range of applicant abilities.
Methods: Construct-level predictive validities were calculated in six cohort studies of medical student selection and training (student entry, 1972 to 2009) for a range of predictors, including A-levels, General Certificates of Secondary Education (GCSEs)/O-levels, and aptitude tests (AH5 and UK Clinical Aptitude Test (UKCAT)). Outcomes included undergraduate basic medical science and finals assessments, as well as postgraduate measures of Membership of the Royal Colleges of Physicians of the United Kingdom (MRCP(UK)) performance and entry in the Specialist Register. Construct-level predictive validity was calculated with the method of Hunter, Schmidt and Le (2006), adapted to correct for right-censorship of examination results due to grade inflation.
Results: Meta-regression analyzed 57 separate predictor-outcome correlations (POCs) and construct-level predictive validities (CLPVs). Mean CLPVs are substantially higher (.450) than mean POCs (.171). Mean CLPVs for first-year examinations, were high for A-levels (.809; CI: .501 to .935), and lower for GCSEs/O-levels (.332; CI: .024 to .583) and UKCAT (mean = .245; CI: .207 to .276). A-levels had higher CLPVs for all undergraduate and postgraduate assessments than did GCSEs/O-levels and intellectual aptitude tests. CLPVs of educational attainment measures decline somewhat during training, but continue to predict postgraduate performance. Intellectual aptitude tests have lower CLPVs than A-levels or GCSEs/O-levels.
Conclusions: Educational attainment has strong CLPVs for undergraduate and postgraduate performance, accounting for perhaps 65% of true variance in first year performance. Such CLPVs justify the use of educational attainment measure in selection, but also raise a key theoretical question concerning the remaining 35% of variance (and measurement error, range restriction and right-censorship have been taken into account). Just as in astrophysics, ‘dark matter’ and ‘dark energy’ are posited to balance various theoretical equations, so medical student selection must also have its ‘dark variance’, whose nature is not yet properly characterized, but explains a third of the variation in performance during training. Some variance probably relates to factors which are unpredictable at selection, such as illness or other life events, but some is probably also associated with factors such as personality, motivation or study skills
Recommended from our members
A prosodically controlled word and nonword repetition task for 2- to 4- year-olds: Evidence from typically developing children
An association has been found between nonword repetition and language skills in school-aged children with both typical and atypical language development (Dollaghan & Campbell, 1998; Ellis Weismer et al., 2000; Gathercole & Baddeley, 1990; Montgomery, 2002). This raises the possibility that younger children’s repetition performance may be predictive of later language deficits. In order to investigate this possibility, it is important to establish that elicited repetition with very young children is both feasible and informative. To this end, a repetition task was designed and carried out with 66 children aged 2-4. The task consisted of 18 words and 18 matched nonwords that were systematically manipulated for length and prosodic structure. In addition, an assessment of receptive vocabulary was administered.
The repetition task elicited high levels of response. Total scores as well as word and nonword scores were sensitive to age. Lexical status and item length affected performance regardless of age: words were repeated more accurately than nonwords, and one-syllable items were repeated more accurately than two-syllable items, which were in turn repeated more accurately than three-syllable items. The effect of prosodic structure was also significant. Whole syllable errors were almost exclusive to unstressed syllables, with those preceding stress being most vulnerable. Performance on the repetition task was significantly correlated with performance on the receptive vocabulary test. Since this repetition task was effective in eliciting responses from most of the 2 to 4-year-old participants, tapped developmental change in their repetition skills, and revealed patterns in their performance, it has the potential to identify deficits in very early repetition skills that may be indicative of wider language difficulties
- …
