Search CORE

24 research outputs found

Recommended from our members

Factors influencing the performance of the Mantel-Haenszel procedure in identifying differential item functioning.

Author: Clauser Brian E.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/1993
Field of study

The Mantel-Haenszel (MH) procedure has emerged as one of the methods of choice for identification of differentially functioning test items (DIF). Although there has been considerable research examining its performance in this context, important gaps remain in the knowledge base for effectively applying this procedure. This investigation is an attempt to fill these gaps with the results of five simulation studies. The first study is an examination of the utility of the two-step procedure recommended by Holland and Thayer in which the matching criterion used in the second step is refined by removing items identified in the first step. The results showed that using the two-step procedure is associated with a reduction in the Type II error rate. In the second study, the capability of the MH procedure to identify uniform DIF was examined. The statistic was used to identify simulated DIF in items with varying levels of difficulty and discrimination and with differing levels of difference in difficulty between groups. The results indicated that when difference in difficulty was held constant, poorly discriminating items and items that were very difficult were less likely to be identified by the procedure. In the third study, the effects of sample size were considered. In spite of the fact that the MH procedure has been repeatedly recommended for use with small samples, the results of this study suggest that samples below 200 per group may be inadequate. Performance with larger samples was satisfactory and improved as samples increased. The fourth study is an examination of the effects of score group width on the statistic. Holland and Thayer recommended that n + 1 score groups should be used for matching (where n is the number of items). Since then, various authors have suggested that there may be utility in using fewer (wider) score groups. It was shown that use of this variation on the MH procedure could result in dramatically increased type I error rates. In the final study, a simple variation on the MH statistic which may allow it to identify non-uniform DIF was examined. The MH statistic\u27s inability to identify certain types of non-uniform DIF items has been noted as a major shortcoming. Use of the variation resulted in identification of many of the simulated non-uniform DIF items with little or no increase in the type I error rate

ScholarWorks@UMass Amherst

Physician Experiences and Understanding of Genomic Sequencing in Oncology

Author: A Buchanan
A Buchanan
AB Hamilton
AN Freedman
Arul M. Chinnaiyan
Beverly M. Yashar
BJ Zikmund-Fisher
BR Hirsch
Brian J. Zikmund-Fisher
C Tourneau Le
Caroline M. Weipert
CC Gunderson
CE Koil
D Cragun
DR Robinson
E Goss
ED Green
EJF Houwink
EM Allen Van
F Meric-Bernstam
J Armstrong
J Larsson
J. Scott Roberts
Jessica N. Everett
JN Everett
JS Downs
Kerry A. Ryan
LA Garraway
M Schwaederle
MJ Khoury
MK Frey
ML McGowan
MO Dorschner
MS Copur
MT Scheuner
R Klitzman
RA Bell
Raymond De Vries
RJ Mody
S Haga
S Roychowdhury
SB Clauser
SL Cox
SW Gray
T Delikurt
V Pan
Victoria M. Raymond
VM Raymond
WB Sateren
Y Bombard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2018
Field of study

The amount of information produced by genomic sequencing is vast, technically complicated, and can be difficult to interpret. Appropriately tailoring genomic information for nonâ geneticists is an essential next step in the clinical use of genomic sequencing. To initiate development of a framework for genomic results communication, we conducted eighteen qualitative interviews with oncologists who had referred adult cancer patients to a matched tumorâ normal tissue genomic sequencing study. In our qualitative analysis, we found varied levels of clinician knowledge relating to sequencing technology, the scope of the tumor genomic sequencing study, and incidental germline findings. Clinicians expressed a perceived need for more genetics education. Additionally, they had a variety of suggestions for improving results reports and possible resources to aid in results interpretation. Most clinicians felt genetic counselors were needed when incidental germline findings were identified. Our research suggests that more consistent genetics education is imperative in ensuring the proper utilization of genomic sequencing in cancer care. Clinician suggestions for results interpretation resources and results report modifications could be used to improve communication. Cliniciansâ perceived need to involve genetic counselors when incidental germline findings were found suggests genetic specialists could play a critical role in ensuring patients receive appropriate followâ up.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147187/1/jgc40187.pd

Crossref

Deep Blue Documents

The Impact of Examinee Performance Information on Judges’ Cut Scores in Modified Angoff Standard-Setting Exercises

Author: Clauser Brian E.
Margolis Melissa J.
Publication venue: National Council on Measurement in Education
Publication date: 01/01/2014
Field of study

Educational Measurement: Issues and Practice, Vol. 33, No. 1, pp. 15–22This research evaluated the impact of a common modification to Angoff standard-setting exercises: the provision of examinee performance data. Data from 18 independent standard-setting panels across three different medical licensing examinations were examined to investigate whether and how the provision of performance information impacted judgments and the resulting cut scores. Results varied by panel but in general indicated that both the variability among the panelists and the resulting cut scores were affected by the data. After the review of performance data, panelist variability generally decreased. In addition, for all panels and examinations pre- and post-data cut scores were significantly different. Investigation of the practical significance of the findings indicated that nontrivial fail rate changes were associated with the cut score changes for a majority of standard-setting exercises. This study is the first to provide a large-scale, systematic evaluation of the impact of a common standard setting practice, and the results can provide practitioners with insight into how the practice influences panelist variability and resulting cut scores

Ministerio de Educación del Perú

Evaluation of missing data in an assessment of professional behaviors

Author: Clauser Brian E.
Holtman Matthew
Margolis Melissa J.
Mazor Kathleen M.
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 18/10/2007
Field of study

BACKGROUND: The National Board of Medical Examiners is currently developing the Assessment of Professional Behaviors, a multisource feedback (MSF) tool intended for formative use with medical students and residents. This study investigated whether missing responses on this tool can be considered random; evidence that missing values are not random would suggest response bias, a significant threat to score validity. METHOD: Correlational analyses of pilot data (N = 2,149) investigated whether missing values were systematically related to global evaluations of observees. RESULTS: The percentage of missing items was correlated with global evaluations of observees; observers answered more items for preferred observees compared with nonpreferred observees. CONCLUSIONS: Missing responses on this MSF tool seem to be nonrandom and are instead systematically related to global perceptions of observees. Further research is needed to determine whether modifications to the items, the instructions, or other components of the assessment process can reduce this effect

eScholarship@UMMS

Using natural language processing to predict item response times and improve test construction

Author: Baldwin Peter
Clauser Brian E
Ha Le An
Mee Janet
Yaneva Victoria
Publication venue: 'Wiley'
Publication date: 13/12/2019
Field of study

In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information‐retrieval‐based automatic question‐answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random forests algorithm, these data then are used to train a prediction model for item response times and predicted response times then are used to assemble test forms. Using empirical data from the United States Medical Licensing Examination, we show that timing demands are more consistent across these specially assembled forms than across forms comprising randomly‐selected items. Because an exam's timing conditions affect examinee performance, this result has implications for exam fairness whenever examinees are compared with each other or against a common standard.Published onlin

Wolverhampton Intellectual Repository and E-theses

Collecting validity evidence for an assessment of professionalism: findings from think-aloud interviews

Author: Canavan Colleen
Clauser Brian E.
Farrell Margaret
Margolis Melissa J.
Mazor Kathleen M.
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 10/10/2008
Field of study

BACKGROUND: This study investigated whether participants\u27 subjective reports of how they assigned ratings on a multisource feedback instrument provide evidence to support interpreting the resulting scores as objective, accurate measures of professional behavior. METHOD: Twenty-six participants completed think-aloud interviews while rating students, residents, or faculty members they had worked with previously. The items rated included 15 behavioral items and one global item. RESULTS: Participants referred to generalized behaviors and global impressions six times as often as specific behaviors, rated observees in the absence of information necessary to do so, relied on indirect evidence about performance, and varied in how they interpreted items. CONCLUSIONS: Behavioral change becomes difficult to address if it is unclear what behaviors raters considered when providing feedback. These findings highlight the importance of explicitly stating and empirically investigating the assumptions that underlie the use of an observational assessment tool

eScholarship@UMMS

Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic

Author: Berk R. A.
Brian E. Clauser
Hambleton R.K.
Kathleen Mazor
Ronald K. Hambleton
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Identification of Nonuniform Differential Item Functioning Using a Variation of the Mantel-Haenszel Procedure

Author: Brian E. Clauser
Holland P.
Kathleen M. Mazor
Kingston N.
Rogers H. J.
Ronald K. Hambleton
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Development of a Scoring Algorithm to Replace Expert Rating for Scoring a Complex Performance-Based Assessment

Author: Brian E. Clauser
Clauser B. E.
Clauser B. E.
Gary L. Malakoff
Gigi El-Bayoumi
Hardy R. A.
Julian E. R.
Kathie M. Rose
Linette P. Rose
Lucy Chang
Melissa J. Margolis
Pierre S. Pincetl
Ronald J. Nungester
Stephen G. Clyman
Thomas E. Piemme
Wainer H.
Webster G. D.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref