Search CORE

115 research outputs found

Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R

Author: Jasjeet S. Sekhon
Publication venue
Publication date
Field of study

Matching is an R package which provides functions for multivariate and propensity score matching and for finding optimal covariate balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance actually has been obtained are provided. The underlying matching algorithm is written in C++, makes extensive use of system BLAS and scales efficiently with dataset size. The genetic algorithm which finds optimal balance is parallelized and can make use of multiple CPUs or a cluster of computers. A large number of options are provided which control exactly how the matching is conducted and how balance is evaluated.

Research Papers in Economics

Genetic Optimization Using Derivatives: The rgenoud Package for R

Author: Jasjeet S. Sekhon
Walter R. Mebane Jr.
Publication venue
Publication date
Field of study

genoud is an R function that combines evolutionary algorithm methods with a derivative-based (quasi-Newton) method to solve difficult optimization problems. genoud may also be used for optimization problems for which derivatives do not exist. genoud solves problems that are nonlinear or perhaps even discontinuous in the parameters of the function to be optimized. When the function to be optimized (for example, a log-likelihood) is nonlinear in the model's parameters, the function will generally not be globally concave and may have irregularities such as saddlepoints or discontinuities. Optimization methods that rely on derivatives of the objective function may be unable to find any optimum at all. Multiple local optima may exist, so that there is no guarantee that a derivative-based method will converge to the global optimum. On the other hand, algorithms that do not use derivative information (such as pure genetic algorithms) are for many problems needlessly poor at local hill climbing. Most statistical problems are regular in a neighborhood of the solution. Therefore, for some portion of the search space, derivative information is useful. The function supports parallel processing on multiple CPUs on a single machine or a cluster of computers.

Research Papers in Economics

Black Candidates and Black Voters: Assessing the Impact of Candidate Race on Uncounted Vote Rates

Author: Herron Michael C
Sekhon Jasjeet S
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2005
Field of study

Numerous studies show that the rate at which African‐Americans cast ballots with missing or invalid votes, i.e., the African‐American residual vote rate, is higher than the corresponding white rate. While existing literature argues that the plethora of African‐American residual votes is caused by administrative problems or socioeconomic factors, we show using precinct‐level data from two recent elections in Cook County, Illinois, that the African‐American residual vote rate in electoral contests with black candidates is less than half the rate in contests without black candidates. African Americans, therefore, are able to reduce their residual vote rate when they wish to do so. We present complementary findings for white voters, whose residual vote rate often substantially increases in contests which feature dominant black candidates

CiteSeerX

Dartmouth Digital Commons (Dartmouth College)

Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning

Author: Bickel Peter J.
Künzel Sören R.
Sekhon Jasjeet S.
Yu Bin
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/03/2019
Field of study

There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networks---to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new meta-algorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other, and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the meta-learners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our new X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods

arXiv.org e-Print Archive

eScholarship - University of California

A comparison of alternative strategies for choosing control populations in observational studies.

Author: Grieve Richard
Sekhon Jasjeet S
Steventon Adam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Various approaches have been used to select control groups in observational studies: (1) from within the intervention area; (2) from a convenience sample, or randomly chosen areas; (3) from areas matched on area-level characteristics; and (4) nationally. The consequences of the decision are rarely assessed but, as we show, it can have complex impacts on confounding at both the area and individual levels. We began by reanalyzing data collected for an evaluation of a rapid response service on rates of unplanned hospital admission. Balance on observed individual-level variables was better with external than local controls, after matching. Further, when important prognostic variables were omitted from the matching algorithm, imbalances on those variables were also minimized using external controls. Treatment effects varied markedly depending on the choice of control area, but in the case study the variation was minimal after adjusting for the characteristics of areas. We used simulations to assess relative bias and means-squared error, as this could not be done in the case study. A particular feature of the simulations was unexplained variation in the outcome between areas. We found that the likely impact of unexplained variation for hospital admissions dwarfed the benefits of better balance on individual-level variables, leading us to prefer local controls in this instance. In other scenarios, in which there was less unexplained variation in the outcome between areas, bias and mean-squared error were optimized using external controls. We identify some general considerations relevant to the choice of control population in observational studies

Crossref

LSHTM Research Online

PubMed Central

When Natural Experiments Are Neither Natural nor Experiments

Author: Diamond
Dunning
JASJEET S. SEKHON
Kaushik
Kishwar
ROCÍO TITIUNIK
Sekhon
Singh
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref