71 research outputs found

    Regression of binary network data with exchangeable latent errors

    Full text link
    Undirected, binary network data consist of indicators of symmetric relations between pairs of actors. Regression models of such data allow for the estimation of effects of exogenous covariates on the network and for prediction of unobserved data. Ideally, estimators of the regression parameters should account for the inherent dependencies among relations in the network that involve the same actor. To account for such dependencies, researchers have developed a host of latent variable network models, however, estimation of many latent variable network models is computationally onerous and which model is best to base inference upon may not be clear. We propose the Probit Exchangeable (PX) Model for undirected binary network data that is based on an assumption of exchangeability, which is common to many of the latent variable network models in the literature. The PX model can represent the second moments of any exchangeable network model, yet specifies no particular parametric model. We present an algorithm for obtaining the maximum likelihood estimator of the PX model, as well as a modified version of the algorithm that is extremely computationally efficient and provides an approximate estimator. Using simulation studies, we demonstrate the improvement in estimation of regression coefficients of the proposed model over existing latent variable network models. In an analysis of purchases of politically-aligned books, we demonstrate political polarization in purchase behavior and show that the proposed estimator significantly reduces runtime relative to estimators of latent variable network models while maintaining predictive performance

    Women 1.5 Times More Likely to Leave STEM Pipeline After Calculus Compared to Men: Lack of Mathematical Confidence a Potential Culprit

    Full text link
    The substantial gender gap in the science, technology, engineering, and mathematics (STEM) workforce can be traced back to the underrepresentation of women at various milestones in the career pathway. Calculus is a necessary step in this pathway and has been shown to often dissuade people from pursuing STEM fields. We examine the characteristics of students who begin college interested in STEM and either persist or switch out of the calculus sequence after taking Calculus I, and hence either continue to pursue a STEM major or are dissuaded from STEM disciplines. The data come from a unique, national survey focused on mainstream college calculus. Our analyses show that, while controlling for academic preparedness, career intentions, and instruction, the odds of a woman being dissuaded from continuing in calculus is 1.5 times greater than that for a man. Furthermore, women report they do not understand the course material well enough to continue significantly more often than men. When comparing women and men with above-average mathematical abilities and preparedness, we find women start and end the term with significantly lower mathematical confidence than men. This suggests a lack of mathematical confidence, rather than a lack of mathematically ability, may be responsible for the high departure rate of women. While it would be ideal to increase interest and participation of women in STEM at all stages of their careers, our findings indicate that simply increasing the retention of women starting in college calculus would almost double the number of women entering the STEM workforce.Comment: 27 pages, 3 figures, includes Supplemental Informatio

    The causal effect of a timeout at stopping an opposing run in the NBA

    Full text link
    In the summer of 2017, the National Basketball Association reduced the number of total timeouts, along with other rule changes, to regulate the flow of the game. With these rule changes, it becomes increasingly important for coaches to effectively manage their timeouts. Understanding the utility of a timeout under various game scenarios, e.g., during an opposing team's run, is of the utmost importance. There are two schools of thought when the opposition is on a run: (1) call a timeout and allow your team to rest and regroup, or (2) save a timeout and hope your team can make corrections during play. This paper investigates the credence of these tenets using the Rubin causal model framework to quantify the causal effect of a timeout in the presence of an opposing team's run. Too often overlooked, we carefully consider the stable unit-treatment-value assumption (SUTVA) in this context and use SUTVA to motivate our definition of units. To measure the effect of a timeout, we introduce a novel, interpretable outcome based on the score difference to describe broad changes in the scoring dynamics. This outcome is well-suited for situations where the quantity of interest fluctuates frequently, a commonality in many sports analytics applications. We conclude from our analysis that while comebacks frequently occur after a run, it is slightly disadvantageous to call a timeout during a run by the opposing team and further demonstrate that the magnitude of this effect varies by franchise

    Sampling random graphs with specified degree sequences

    Full text link
    The configuration model is a standard tool for uniformly generating random graphs with a specified degree sequence, and is often used as a null model to evaluate how much of an observed network's structure can be explained by its degree structure alone. A Markov chain Monte Carlo (MCMC) algorithm, based on a degree-preserving double-edge swap, provides an asymptotic solution to sample from the configuration model. However, accurately and efficiently detecting this Markov chain's convergence on its stationary distribution remains an unsolved problem. Here, we provide a solution to detect convergence and sample from the configuration model. We develop an algorithm, based on the assortativity of the sampled graphs, for estimating the gap between effectively independent MCMC states, and a computationally efficient gap-estimation heuristic derived from analyzing a corpus of 509 empirical networks. We provide a convergence detection method based on the Dickey-Fuller Generalized Least Squares test, which we show is more accurate and efficient than three alternative Markov chain convergence tests.Comment: Same as version v3 but with corrected white spaces between paragraph

    Restricted Regression in Networks

    Full text link
    Network regression with additive node-level random effects can be problematic when the primary interest is estimating unconditional regression coefficients and some covariates are exactly or nearly in the vector space of node-level effects. We introduce the Restricted Network Regression model, that removes the collinearity between fixed and random effects in network regression by orthogonalizing the random effects against the covariates. We discuss the change in the interpretation of the regression coefficients in Restricted Network Regression and analytically characterize the effect of Restricted Network Regression on the regression coefficients for continuous response data. We show through simulation with continuous and binary response data that Restricted Network Regression mitigates, but does not alleviate, network confounding, by providing improved estimation of the regression coefficients. We apply the Restricted Network Regression model in an analysis of 2015 Eurovision Song Contest voting data and show how the choice of regression model affects inference.Comment: (40 pages, 9 figures, 2 tables, including supplement

    Assessing the Burden of COVID-19 in Developing Countries: Systematic Review, Meta-Analysis, and Public Policy Implications

    Full text link
    Abstract Introduction The infection fatality rate (IFR) of COVID-19 has been carefully measured and analysed in high-income countries, whereas there has been no systematic analysis of age-specific seroprevalence or IFR for developing countries. Methods We systematically reviewed the literature to identify all COVID-19 serology studies in developing countries that were conducted using representative samples collected by February 2021. For each of the antibody assays used in these serology studies, we identified data on assay characteristics, including the extent of seroreversion over time. We analysed the serology data using a Bayesian model that incorporates conventional sampling uncertainty as well as uncertainties about assay sensitivity and specificity. We then calculated IFRs using individual case reports or aggregated public health updates, including age-specific estimates whenever feasible. Results In most locations in developing countries, seroprevalence among older adults was similar to that of younger age cohorts, underscoring the limited capacity that these nations have to protect older age groups. Age-specific IFRs were roughly 2 times higher than in high-income countries. The median value of the population IFR was about 0.5%, similar to that of high-income countries, because disparities in healthcare access were roughly offset by differences in population age structure. Conclusion The burden of COVID-19 is far higher in developing countries than in high-income countries, reflecting a combination of elevated transmission to middle-aged and older adults as well as limited access to adequate healthcare. These results underscore the critical need to ensure medical equity to populations in developing countries through provision of vaccine doses and effective medications
    • …
    corecore