32 research outputs found
Applying the Breslow-Day Test of Trend in Odds Ratio Heterogeneity to the Analysis of Nonuniform DIF
This article applies the Breslow-Day test of trend in odds ratio heterogeneity (BD) to the detection of nonuniform DIF. A simulation study was conducted to assess the power and Type I error rate of BD, as well as a combined decision rule (CDR) whereby a decision of the existence of DIF was based on a combination of the decisions made using BD and the Mantel-Haenszel chi-square. The results indicated that CDR displayed good Type I error rate and power across a variety of conditions. Comparing these results with those of earlier research indicates that CDR may yield more accurate decisions about DIF than other commonly used DIF detection procedures
Confidence Intervals For An Effect Size When Variances Are Not Equal
Confidence intervals must be robust in having nominal and actual probability coverage in close agreement. This article examined two ways of computing an effect size in a two-group problem: (a) the classic approach which divides the mean difference by a single standard deviation and (b) a variant of a method which replaces least squares values with robust trimmed means and a Winsorized variance. Confidence intervals were determined with theoretical and bootstrap critical values. Only the method that used robust estimators and a bootstrap critical value provided generally accurate probability coverage under conditions of nonnormality and variance heterogeneity in balanced as well as unbalanced designs
Confidence Intervals for the Squared Multiple Semipartial Correlation Coefficient
The squared multiple semipartial correlation coefficient is the increase in the squared multiple correlation coefficient that occurs when two or more predictors are added to a multiple regression model. Coverage probability was investigated for two variations of each of three methods for setting confidence intervals for the population squared multiple semipartial correlation coefficient. Results indicated that the procedure that provides coverage probability in the [.925, .975] interval for a 95% confidence interval depends primarily on the number of added predictors. Guidelines for selecting a procedure are presented
50 Years of Test (Un)fairness: Lessons for Machine Learning
Quantitative definitions of what is unfair and what is fair have been
introduced in multiple disciplines for well over 50 years, including in
education, hiring, and machine learning. We trace how the notion of fairness
has been defined within the testing communities of education and hiring over
the past half century, exploring the cultural and social context in which
different fairness definitions have emerged. In some cases, earlier definitions
of fairness are similar or identical to definitions of fairness in current
machine learning research, and foreshadow current formal work. In other cases,
insights into what fairness means and how to measure it have largely gone
overlooked. We compare past and current notions of fairness along several
dimensions, including the fairness criteria, the focus of the criteria (e.g., a
test, a model, or its use), the relationship of fairness to individuals,
groups, and subgroups, and the mathematical method for measuring fairness
(e.g., classification, regression). This work points the way towards future
research and measurement of (un)fairness that builds from our modern
understanding of fairness while incorporating insights from the past.Comment: FAT* '19: Conference on Fairness, Accountability, and Transparency
(FAT* '19), January 29--31, 2019, Atlanta, GA, US
A Comparison of Adjacent Categories and Cumulative Differential Step Functioning Effect Estimators
The study of measurement invariance in polytomous items that targets individual score levels is known as differential step functioning (DSF). The analysis of DSF requires the creation of a set of dichotomizations of the item response variable. There are two primary approaches for creating the set of dichotomizations to conduct a DSF analysis: the adjacent categories approach, and the cumulative approach. To date, there is limited research on how these two approaches compare within the context of DSF, particularly as applied to a real data set. This study evaluated the results of a DSF analysis using both dichotomization schemes in order to determine if the two approaches yield similar results. The results revealed that the two approaches generally led to consistent results, particularly in the case where DSF effects were negligible. However, when significant DSF effects were present, the two approaches occasionally led to differing conclusions
Classroom-Based Cognitive-Behavioral Intervention to Prevent Aggression: Efficacy and Social Validity
Classroom teachers need effective, efficient strategies to prevent and/or ameliorate destructive student behaviors and increase socially appropriate ones. During the past two decades, researchers have found that cognitive strategies can decrease student disruption/aggression and strengthen pro-social behavior. Following preliminary pilot work, we conducted a study to determine whether a classwide, social problem-solving curriculum affected measures of knowledge and behavior for 165 4th and 5th grade students at risk for behavior problems. We found significant positive treatment effects on knowledge of problem-solving concepts and teacher ratings of aggression. Outcomes differed across teachers/classrooms, and there was no evidence that booster lessons affected treatment efficacy. Teacher ratings of social validity were generally positive. We discuss issues about classroom-based prevention research and future research directions
Recommended from our members
Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration
The assessment of differential item functioning (DIF) in polytomous items addresses between-group differences in measurement properties at the item level, but typically does not inform which score levels may be involved in the DIF effect. The framework of differential step functioning (DSF) addresses this issue by examining between-group differences in the measurement properties at each step underlying the polytomous response variable. The pattern of the DSF effects across the steps of the polytomous response variable can assume several different forms, and the different forms can have different implications for the sensitivity of DIF detection and the final interpretation of the causes of the DIF effect. In this article we propose a taxonomy of DSF forms, establish guidelines for using the form of DSF to help target and guide item content review and item revision, and provide procedural rules for using the frameworks of DSF and DIF in tandem to yield a comprehensive assessment of between-group measurement equivalence in polytomous items
Recommended from our members
Methods for Assessing Item, Step, and Threshold Invariance in Polytomous Items Following the Partial Credit Model
Measurement invariance in the partial credit model (PCM) can be conceptualized in several different but compatible ways. In this article the authors distinguish between three forms of measurement invariance in the PCM: step invariance, item invariance, and threshold invariance. Approaches for modeling these three forms of invariance are proposed, and the mathematical relationship between the three forms is established. Parametric and contingency table approaches for assessing the three forms of invariance are presented, and the application of the parametric and contingency table approaches to a real data set is described. The invariance effect estimates observed for the parametric and contingency table approaches are consistent with the theoretical equivalence of the two approaches