Search CORE

14 research outputs found

Student evaluations of teaching are not only unreliable, they are significantly biased against female instructors

Author: Boring Anne
Ottoboni Kellie
Stark Philip B.
Publication venue: The London School of Economics and Political Science
Publication date: 04/02/2016
Field of study

A series of studies across countries and disciplines in higher education confirm that student evaluations of teaching (SET) are significantly correlated with instructor gender, with students regularly rating female instructors lower than male peers. Anne Boring, Kellie Ottoboni and Philip B. Stark argue the findings warrant serious attention in light of increasing pressure on universities to measure teaching effectiveness. Given the unreliability of the metric and the harmful impact these evaluations can have, universities should think carefully on the role of such evaluations in decision-making

LSE Research Online

Recommended from our members

Classical Nonparametric Hypothesis Tests with Applications in Social Good

Author: Ottoboni Kellie
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Hypothesis testing has come under fire in the past decade as misuses have become increas- ingly visible. It is common to use tests whose assumptions don’t reflect how the data were collected, and editorial policies of many journals reward “p-hacking” by setting the arbitrary threshold of 0.05 to determine whether a result merits publication. In fact, properly designed hypothesis tests are an invaluable tool for inference and decision-making. Classical nonparametric tests, once reserved for problems that could be worked out with pencil and paper or approximated asymptotically, can now be applied to complex datasets with the help of modern computing power. This dissertation tailors some nonparametric tests to modern applications for social good.Permutation tests are a class of hypothesis tests for data that involve random (or plausibly random) assignment. The parametric assumptions for common tests, like the t-test and linear regression, may not hold for randomized experiments; in contrast, the assumptions of permutation tests are implied by the experimental design. But off-the-shelf permutation tests are not a panacea: tests must be tailored to fit the experimental design, and there are subtle numerical issues with implementing the tests in software. We construct permutation tests and software to address particular questions in randomized and natural experiments, including identifying what, if anything, student evaluations of teaching measure, and whether voting machines malfunctioned in Georgia’s November 2018 election.Risk-limiting post-election audits (RLAs) have existed for a decade, but have not been adopted widely, in part due to logistical hurdles. This thesis uses classical nonparametric techniques, including Fisher’s combination method and Wald’s sequential probability ratio test, to build new RLA methods that accommodate the idiosyncratic logistics of statewide elections. A new, more flexible method for using stratified samples in RLAs makes it easier and more efficient to audit elections conducted on heterogeneous voting equipment. This thesis also develops an RLA method based on Bernoulli sampling, which allows ballots to be audited “in parallel” across precincts on Election Day. The RLA method for stratified samples of ballots was piloted in Michigan to study its performance in the face of real-world constraints

eScholarship - University of California

Student Evaluations of Teaching (Mostly) Do Not Measure Teaching Effectiveness

Author: Anne Boring
Kellie Ottoboni
Philip Stark
Publication venue: 'ScienceOpen'
Publication date: 01/01/2016
Field of study

Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show: SET are biased against female instructors by an amount that is large and statistically significant the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded the bias varies by discipline and by student gender, among other things it is not possible to adjust for the bias, because it depends on so many factors SET are more sensitive to students' gender bias and grade expectations than they are to teaching effectiveness gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.These findings are based on nonparametric statistical tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.</p

Directory of Open Access Journals

Estimating population average treatment effects from experiments with noncompliance

Author: Ottoboni Kellie N.
Poulos Jason V.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/10/2020
Field of study

Randomized control trials (RCTs) are the gold standard for estimating causal effects, but often use samples that are non-representative of the actual population of interest. We propose a reweighting method for estimating population average treatment effects in settings with noncompliance. Simulations show the proposed compliance-adjusted population estimator outperforms its unadjusted counterpart when compliance is relatively low and can be predicted by observed covariates. We apply the method to evaluate the effect of Medicaid coverage on health care use for a target population of adults who may benefit from expansions to the Medicaid program. We draw RCT data from the Oregon Health Insurance Experiment, where less than one-third of those randomly selected to receive Medicaid benefits actually enrolled

Directory of Open Access Journals