344 research outputs found
Optimal Estimation of Large-Dimensional Nonlinear Factor Models
This paper studies optimal estimation of large-dimensional nonlinear factor
models. The key challenge is that the observed variables are possibly nonlinear
functions of some latent variables where the functional forms are left
unspecified. A local principal component analysis method is proposed to
estimate the factor structure and recover information on latent variables and
latent functions, which combines -nearest neighbors matching and principal
component analysis. Large-sample properties are established, including a sharp
bound on the matching discrepancy of nearest neighbors, sup-norm error bounds
for estimated local factors and factor loadings, and the uniform convergence
rate of the factor structure estimator. Under mild conditions our estimator of
the latent factor structure can achieve the optimal rate of uniform convergence
for nonparametric regression. The method is illustrated with a Monte Carlo
experiment and an empirical application studying the effect of tax cuts on
economic growth.Comment: arXiv admin note: text overlap with arXiv:2008.1365
Binscatter Regressions
We introduce the \texttt{Stata} (and \texttt{R}) package \textsf{Binsreg},
which implements the binscatter methods developed in
\citet*{Cattaneo-Crump-Farrell-Feng_2019_Binscatter}. The package includes the
commands \texttt{binsreg}, \texttt{binsregtest}, and \texttt{binsregselect}.
The first command (\texttt{binsreg}) implements binscatter for the regression
function and its derivatives, offering several point estimation, confidence
intervals and confidence bands procedures, with particular focus on
constructing binned scatter plots. The second command (\texttt{binsregtest})
implements hypothesis testing procedures for parametric specification and for
nonparametric shape restrictions of the unknown regression function. Finally,
the third command (\texttt{binsregselect}) implements data-driven number of
bins selectors for binscatter implementation using either quantile-spaced or
evenly-spaced binning/partitioning. All the commands allow for covariate
adjustment, smoothness restrictions, weighting and clustering, among other
features. A companion \texttt{R} package with the same capabilities is also
available
On Binscatter
Binscatter is very popular in applied microeconomics. It provides a flexible,
yet parsimonious way of visualizing and summarizing large data sets in
regression settings, and it is often used for informal evaluation of
substantive hypotheses such as linearity or monotonicity of the regression
function. This paper presents a foundational, thorough analysis of binscatter:
we give an array of theoretical and practical results that aid both in
understanding current practices (i.e., their validity or lack thereof) and in
offering theory-based guidance for future applications. Our main results
include principled number of bins selection, confidence intervals and bands,
hypothesis tests for parametric and shape restrictions of the regression
function, and several other new methods, applicable to canonical binscatter as
well as higher-order polynomial, covariate-adjusted and smoothness-restricted
extensions thereof. In particular, we highlight important methodological
problems related to covariate adjustment methods used in current practice. We
also discuss extensions to clustered data. Our results are illustrated with
simulated and real data throughout. Companion general-purpose software packages
for \texttt{Stata} and \texttt{R} are provided. Finally, from a technical
perspective, new theoretical results for partitioning-based series estimation
are obtained that may be of independent interest
Uniform Inference for Kernel Density Estimators with Dyadic Data
Dyadic data is often encountered when quantities of interest are associated
with the edges of a network. As such it plays an important role in statistics,
econometrics and many other data science disciplines. We consider the problem
of uniformly estimating a dyadic Lebesgue density function, focusing on
nonparametric kernel-based estimators taking the form of dyadic empirical
processes. Our main contributions include the minimax-optimal uniform
convergence rate of the dyadic kernel density estimator, along with strong
approximation results for the associated standardized and Studentized
-processes. A consistent variance estimator enables the construction of
valid and feasible uniform confidence bands for the unknown density function.
We showcase the broad applicability of our results by developing novel
counterfactual density estimation and inference methodology for dyadic data,
which can be used for causal inference and program evaluation. A crucial
feature of dyadic distributions is that they may be "degenerate" at certain
points in the support of the data, a property making our analysis somewhat
delicate. Nonetheless our methods for uniform inference remain robust to the
potential presence of such points. For implementation purposes, we discuss
inference procedures based on positive semi-definite covariance estimators,
mean squared error optimal bandwidth selectors and robust bias correction
techniques. We illustrate the empirical finite-sample performance of our
methods both in simulations and with real-world trade data, for which we make
comparisons between observed and counterfactual trade distributions in
different years. Our technical results concerning strong approximations and
maximal inequalities are of potential independent interest.Comment: Article: 23 pages, 3 figures. Supplemental appendix: 72 pages, 3
figure
Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption
We propose principled prediction intervals to quantify the uncertainty of a
large class of synthetic control predictions (or estimators) in settings with
staggered treatment adoption, offering precise non-asymptotic coverage
probability guarantees. From a methodological perspective, we provide a
detailed discussion of different causal quantities to be predicted, which we
call `causal predictands', allowing for multiple treated units with treatment
adoption at possibly different points in time. From a theoretical perspective,
our uncertainty quantification methods improve on prior literature by (i)
covering a large class of causal predictands in staggered adoption settings,
(ii) allowing for synthetic control methods with possibly nonlinear
constraints, (iii) proposing scalable robust conic optimization methods and
principled data-driven tuning parameter selection, and (iv) offering valid
uniform inference across post-treatment periods. We illustrate our methodology
with an empirical application studying the effects of economic liberalization
in the 1990s on GDP for emerging European countries. Companion general-purpose
software packages are provided in Python, R and Stata
Metabolome response to temperature-induced virulence gene expression in two genotypes of pathogenic Vibrio parahaemolyticus
Relative concentration of metabolites identified in Vibrio parahaemolyticus ATCC17802. (XLSX 113 kb)
- …