71 research outputs found
Women 1.5 Times More Likely to Leave STEM Pipeline After Calculus Compared to Men: Lack of Mathematical Confidence a Potential Culprit
The substantial gender gap in the science, technology, engineering, and
mathematics (STEM) workforce can be traced back to the underrepresentation of
women at various milestones in the career pathway. Calculus is a necessary step
in this pathway and has been shown to often dissuade people from pursuing STEM
fields. We examine the characteristics of students who begin college interested
in STEM and either persist or switch out of the calculus sequence after taking
Calculus I, and hence either continue to pursue a STEM major or are dissuaded
from STEM disciplines. The data come from a unique, national survey focused on
mainstream college calculus. Our analyses show that, while controlling for
academic preparedness, career intentions, and instruction, the odds of a woman
being dissuaded from continuing in calculus is 1.5 times greater than that for
a man. Furthermore, women report they do not understand the course material
well enough to continue significantly more often than men. When comparing women
and men with above-average mathematical abilities and preparedness, we find
women start and end the term with significantly lower mathematical confidence
than men. This suggests a lack of mathematical confidence, rather than a lack
of mathematically ability, may be responsible for the high departure rate of
women. While it would be ideal to increase interest and participation of women
in STEM at all stages of their careers, our findings indicate that simply
increasing the retention of women starting in college calculus would almost
double the number of women entering the STEM workforce.Comment: 27 pages, 3 figures, includes Supplemental Informatio
Regression of binary network data with exchangeable latent errors
Undirected, binary network data consist of indicators of symmetric relations
between pairs of actors. Regression models of such data allow for the
estimation of effects of exogenous covariates on the network and for prediction
of unobserved data. Ideally, estimators of the regression parameters should
account for the inherent dependencies among relations in the network that
involve the same actor. To account for such dependencies, researchers have
developed a host of latent variable network models, however, estimation of many
latent variable network models is computationally onerous and which model is
best to base inference upon may not be clear. We propose the Probit
Exchangeable (PX) Model for undirected binary network data that is based on an
assumption of exchangeability, which is common to many of the latent variable
network models in the literature. The PX model can represent the second moments
of any exchangeable network model, yet specifies no particular parametric
model. We present an algorithm for obtaining the maximum likelihood estimator
of the PX model, as well as a modified version of the algorithm that is
extremely computationally efficient and provides an approximate estimator.
Using simulation studies, we demonstrate the improvement in estimation of
regression coefficients of the proposed model over existing latent variable
network models. In an analysis of purchases of politically-aligned books, we
demonstrate political polarization in purchase behavior and show that the
proposed estimator significantly reduces runtime relative to estimators of
latent variable network models while maintaining predictive performance
The causal effect of a timeout at stopping an opposing run in the NBA
In the summer of 2017, the National Basketball Association reduced the number
of total timeouts, along with other rule changes, to regulate the flow of the
game. With these rule changes, it becomes increasingly important for coaches to
effectively manage their timeouts. Understanding the utility of a timeout under
various game scenarios, e.g., during an opposing team's run, is of the utmost
importance. There are two schools of thought when the opposition is on a run:
(1) call a timeout and allow your team to rest and regroup, or (2) save a
timeout and hope your team can make corrections during play. This paper
investigates the credence of these tenets using the Rubin causal model
framework to quantify the causal effect of a timeout in the presence of an
opposing team's run. Too often overlooked, we carefully consider the stable
unit-treatment-value assumption (SUTVA) in this context and use SUTVA to
motivate our definition of units. To measure the effect of a timeout, we
introduce a novel, interpretable outcome based on the score difference to
describe broad changes in the scoring dynamics. This outcome is well-suited for
situations where the quantity of interest fluctuates frequently, a commonality
in many sports analytics applications. We conclude from our analysis that while
comebacks frequently occur after a run, it is slightly disadvantageous to call
a timeout during a run by the opposing team and further demonstrate that the
magnitude of this effect varies by franchise
Sampling random graphs with specified degree sequences
The configuration model is a standard tool for uniformly generating random
graphs with a specified degree sequence, and is often used as a null model to
evaluate how much of an observed network's structure can be explained by its
degree structure alone. A Markov chain Monte Carlo (MCMC) algorithm, based on a
degree-preserving double-edge swap, provides an asymptotic solution to sample
from the configuration model. However, accurately and efficiently detecting
this Markov chain's convergence on its stationary distribution remains an
unsolved problem. Here, we provide a solution to detect convergence and sample
from the configuration model. We develop an algorithm, based on the
assortativity of the sampled graphs, for estimating the gap between effectively
independent MCMC states, and a computationally efficient gap-estimation
heuristic derived from analyzing a corpus of 509 empirical networks. We provide
a convergence detection method based on the Dickey-Fuller Generalized Least
Squares test, which we show is more accurate and efficient than three
alternative Markov chain convergence tests.Comment: Same as version v3 but with corrected white spaces between paragraph
Restricted Regression in Networks
Network regression with additive node-level random effects can be problematic
when the primary interest is estimating unconditional regression coefficients
and some covariates are exactly or nearly in the vector space of node-level
effects. We introduce the Restricted Network Regression model, that removes the
collinearity between fixed and random effects in network regression by
orthogonalizing the random effects against the covariates. We discuss the
change in the interpretation of the regression coefficients in Restricted
Network Regression and analytically characterize the effect of Restricted
Network Regression on the regression coefficients for continuous response data.
We show through simulation with continuous and binary response data that
Restricted Network Regression mitigates, but does not alleviate, network
confounding, by providing improved estimation of the regression coefficients.
We apply the Restricted Network Regression model in an analysis of 2015
Eurovision Song Contest voting data and show how the choice of regression model
affects inference.Comment: (40 pages, 9 figures, 2 tables, including supplement
Assessing the Burden of COVID-19 in Developing Countries: Systematic Review, Meta-Analysis, and Public Policy Implications
Abstract
Introduction The infection fatality rate (IFR) of COVID-19 has been carefully measured and analysed in high-income countries, whereas there has been no systematic analysis of age-specific seroprevalence or IFR for developing countries.
Methods We systematically reviewed the literature to identify all COVID-19 serology studies in developing countries that were conducted using representative samples collected by February 2021. For each of the antibody assays used in these serology studies, we identified data on assay characteristics, including the extent of seroreversion over time. We analysed the serology data using a Bayesian model that incorporates conventional sampling uncertainty as well as uncertainties about assay sensitivity and specificity. We then calculated IFRs using individual case reports or aggregated public health updates, including age-specific estimates whenever feasible.
Results In most locations in developing countries, seroprevalence among older adults was similar to that of younger age cohorts, underscoring the limited capacity that these nations have to protect older age groups.
Age-specific IFRs were roughly 2 times higher than in high-income countries. The median value of the population IFR was about 0.5%, similar to that of high-income countries, because disparities in healthcare access were roughly offset by differences in population age structure.
Conclusion The burden of COVID-19 is far higher in developing countries than in high-income countries, reflecting a combination of elevated transmission to middle-aged and older adults as well as limited access to adequate healthcare. These results underscore the critical need to ensure medical equity to populations in developing countries through provision of vaccine doses and effective medications
- …