115 research outputs found

    Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R

    Get PDF
    Matching is an R package which provides functions for multivariate and propensity score matching and for finding optimal covariate balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance actually has been obtained are provided. The underlying matching algorithm is written in C++, makes extensive use of system BLAS and scales efficiently with dataset size. The genetic algorithm which finds optimal balance is parallelized and can make use of multiple CPUs or a cluster of computers. A large number of options are provided which control exactly how the matching is conducted and how balance is evaluated.

    Genetic Optimization Using Derivatives: The rgenoud Package for R

    Get PDF
    genoud is an R function that combines evolutionary algorithm methods with a derivative-based (quasi-Newton) method to solve difficult optimization problems. genoud may also be used for optimization problems for which derivatives do not exist. genoud solves problems that are nonlinear or perhaps even discontinuous in the parameters of the function to be optimized. When the function to be optimized (for example, a log-likelihood) is nonlinear in the model's parameters, the function will generally not be globally concave and may have irregularities such as saddlepoints or discontinuities. Optimization methods that rely on derivatives of the objective function may be unable to find any optimum at all. Multiple local optima may exist, so that there is no guarantee that a derivative-based method will converge to the global optimum. On the other hand, algorithms that do not use derivative information (such as pure genetic algorithms) are for many problems needlessly poor at local hill climbing. Most statistical problems are regular in a neighborhood of the solution. Therefore, for some portion of the search space, derivative information is useful. The function supports parallel processing on multiple CPUs on a single machine or a cluster of computers.

    Black Candidates and Black Voters: Assessing the Impact of Candidate Race on Uncounted Vote Rates

    Get PDF
    Numerous studies show that the rate at which African‐Americans cast ballots with missing or invalid votes, i.e., the African‐American residual vote rate, is higher than the corresponding white rate. While existing literature argues that the plethora of African‐American residual votes is caused by administrative problems or socioeconomic factors, we show using precinct‐level data from two recent elections in Cook County, Illinois, that the African‐American residual vote rate in electoral contests with black candidates is less than half the rate in contests without black candidates. African Americans, therefore, are able to reduce their residual vote rate when they wish to do so. We present complementary findings for white voters, whose residual vote rate often substantially increases in contests which feature dominant black candidates

    Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning

    Get PDF
    There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networks---to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new meta-algorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other, and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the meta-learners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our new X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods

    A comparison of alternative strategies for choosing control populations in observational studies.

    Get PDF
    Various approaches have been used to select control groups in observational studies: (1) from within the intervention area; (2) from a convenience sample, or randomly chosen areas; (3) from areas matched on area-level characteristics; and (4) nationally. The consequences of the decision are rarely assessed but, as we show, it can have complex impacts on confounding at both the area and individual levels. We began by reanalyzing data collected for an evaluation of a rapid response service on rates of unplanned hospital admission. Balance on observed individual-level variables was better with external than local controls, after matching. Further, when important prognostic variables were omitted from the matching algorithm, imbalances on those variables were also minimized using external controls. Treatment effects varied markedly depending on the choice of control area, but in the case study the variation was minimal after adjusting for the characteristics of areas. We used simulations to assess relative bias and means-squared error, as this could not be done in the case study. A particular feature of the simulations was unexplained variation in the outcome between areas. We found that the likely impact of unexplained variation for hospital admissions dwarfed the benefits of better balance on individual-level variables, leading us to prefer local controls in this instance. In other scenarios, in which there was less unexplained variation in the outcome between areas, bias and mean-squared error were optimized using external controls. We identify some general considerations relevant to the choice of control population in observational studies
    • 

    corecore