201 research outputs found
Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R
Matching is an R package which provides functions for multivariate and propensity score matching and for finding optimal covariate balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance actually has been obtained are provided. The underlying matching algorithm is written in C++, makes extensive use of system BLAS and scales efficiently with dataset size. The genetic algorithm which finds optimal balance is parallelized and can make use of multiple CPUs or a cluster of computers. A large number of options are provided which control exactly how the matching is conducted and how balance is evaluated.
Genetic Optimization Using Derivatives: The rgenoud Package for R
genoud is an R function that combines evolutionary algorithm methods with a derivative-based (quasi-Newton) method to solve difficult optimization problems. genoud may also be used for optimization problems for which derivatives do not exist. genoud solves problems that are nonlinear or perhaps even discontinuous in the parameters of the function to be optimized. When the function to be optimized (for example, a log-likelihood) is nonlinear in the model's parameters, the function will generally not be globally concave and may have irregularities such as saddlepoints or discontinuities. Optimization methods that rely on derivatives of the objective function may be unable to find any optimum at all. Multiple local optima may exist, so that there is no guarantee that a derivative-based method will converge to the global optimum. On the other hand, algorithms that do not use derivative information (such as pure genetic algorithms) are for many problems needlessly poor at local hill climbing. Most statistical problems are regular in a neighborhood of the solution. Therefore, for some portion of the search space, derivative information is useful. The function supports parallel processing on multiple CPUs on a single machine or a cluster of computers.
Lasso adjustments of treatment effect estimates in randomized experiments
We provide a principled way for investigators to analyze randomized
experiments when the number of covariates is large. Investigators often use
linear multivariate regression to analyze randomized experiments instead of
simply reporting the difference of means between treatment and control groups.
Their aim is to reduce the variance of the estimated treatment effect by
adjusting for covariates. If there are a large number of covariates relative to
the number of observations, regression may perform poorly because of
overfitting. In such cases, the Lasso may be helpful. We study the resulting
Lasso-based treatment effect estimator under the Neyman-Rubin model of
randomized experiments. We present theoretical conditions that guarantee that
the estimator is more efficient than the simple difference-of-means estimator,
and we provide a conservative estimator of the asymptotic variance, which can
yield tighter confidence intervals than the difference-of-means estimator.
Simulation and data examples show that Lasso-based adjustment can be
advantageous even when the number of covariates is less than the number of
observations. Specifically, a variant using Lasso for selection and OLS for
estimation performs particularly well, and it chooses a smoothing parameter
based on combined performance of Lasso and OLS
- …
