Search CORE

344 research outputs found

Optimal Estimation of Large-Dimensional Nonlinear Factor Models

Author: Feng Yingjie
Publication venue
Publication date: 13/11/2023
Field of study

This paper studies optimal estimation of large-dimensional nonlinear factor models. The key challenge is that the observed variables are possibly nonlinear functions of some latent variables where the functional forms are left unspecified. A local principal component analysis method is proposed to estimate the factor structure and recover information on latent variables and latent functions, which combines

K

-nearest neighbors matching and principal component analysis. Large-sample properties are established, including a sharp bound on the matching discrepancy of nearest neighbors, sup-norm error bounds for estimated local factors and factor loadings, and the uniform convergence rate of the factor structure estimator. Under mild conditions our estimator of the latent factor structure can achieve the optimal rate of uniform convergence for nonparametric regression. The method is illustrated with a Monte Carlo experiment and an empirical application studying the effect of tax cuts on economic growth.Comment: arXiv admin note: text overlap with arXiv:2008.1365

arXiv.org e-Print Archive

Binscatter Regressions

Author: Cattaneo Matias D.
Crump Richard K.
Farrell Max H.
Feng Yingjie
Publication venue
Publication date: 25/02/2019
Field of study

We introduce the \texttt{Stata} (and \texttt{R}) package \textsf{Binsreg}, which implements the binscatter methods developed in \citet*{Cattaneo-Crump-Farrell-Feng_2019_Binscatter}. The package includes the commands \texttt{binsreg}, \texttt{binsregtest}, and \texttt{binsregselect}. The first command (\texttt{binsreg}) implements binscatter for the regression function and its derivatives, offering several point estimation, confidence intervals and confidence bands procedures, with particular focus on constructing binned scatter plots. The second command (\texttt{binsregtest}) implements hypothesis testing procedures for parametric specification and for nonparametric shape restrictions of the unknown regression function. Finally, the third command (\texttt{binsregselect}) implements data-driven number of bins selectors for binscatter implementation using either quantile-spaced or evenly-spaced binning/partitioning. All the commands allow for covariate adjustment, smoothness restrictions, weighting and clustering, among other features. A companion \texttt{R} package with the same capabilities is also available

arXiv.org e-Print Archive

On Binscatter

Author: Cattaneo Matias D.
Crump Richard K.
Farrell Max H.
Feng Yingjie
Publication venue
Publication date: 25/02/2019
Field of study

Binscatter is very popular in applied microeconomics. It provides a flexible, yet parsimonious way of visualizing and summarizing large data sets in regression settings, and it is often used for informal evaluation of substantive hypotheses such as linearity or monotonicity of the regression function. This paper presents a foundational, thorough analysis of binscatter: we give an array of theoretical and practical results that aid both in understanding current practices (i.e., their validity or lack thereof) and in offering theory-based guidance for future applications. Our main results include principled number of bins selection, confidence intervals and bands, hypothesis tests for parametric and shape restrictions of the regression function, and several other new methods, applicable to canonical binscatter as well as higher-order polynomial, covariate-adjusted and smoothness-restricted extensions thereof. In particular, we highlight important methodological problems related to covariate adjustment methods used in current practice. We also discuss extensions to clustered data. Our results are illustrated with simulated and real data throughout. Companion general-purpose software packages for \texttt{Stata} and \texttt{R} are provided. Finally, from a technical perspective, new theoretical results for partitioning-based series estimation are obtained that may be of independent interest

arXiv.org e-Print Archive

Uniform Inference for Kernel Density Estimators with Dyadic Data

Author: Cattaneo Matias D.
Feng Yingjie
Underwood William G.
Publication venue
Publication date: 13/10/2023
Field of study

Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators taking the form of dyadic empirical processes. Our main contributions include the minimax-optimal uniform convergence rate of the dyadic kernel density estimator, along with strong approximation results for the associated standardized and Studentized

t

-processes. A consistent variance estimator enables the construction of valid and feasible uniform confidence bands for the unknown density function. We showcase the broad applicability of our results by developing novel counterfactual density estimation and inference methodology for dyadic data, which can be used for causal inference and program evaluation. A crucial feature of dyadic distributions is that they may be "degenerate" at certain points in the support of the data, a property making our analysis somewhat delicate. Nonetheless our methods for uniform inference remain robust to the potential presence of such points. For implementation purposes, we discuss inference procedures based on positive semi-definite covariance estimators, mean squared error optimal bandwidth selectors and robust bias correction techniques. We illustrate the empirical finite-sample performance of our methods both in simulations and with real-world trade data, for which we make comparisons between observed and counterfactual trade distributions in different years. Our technical results concerning strong approximations and maximal inequalities are of potential independent interest.Comment: Article: 23 pages, 3 figures. Supplemental appendix: 72 pages, 3 figure

arXiv.org e-Print Archive

Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption

Author: Cattaneo Matias D.
Feng Yingjie
Palomba Filippo
Titiunik Rocio
Publication venue
Publication date: 05/09/2023
Field of study

We propose principled prediction intervals to quantify the uncertainty of a large class of synthetic control predictions (or estimators) in settings with staggered treatment adoption, offering precise non-asymptotic coverage probability guarantees. From a methodological perspective, we provide a detailed discussion of different causal quantities to be predicted, which we call `causal predictands', allowing for multiple treated units with treatment adoption at possibly different points in time. From a theoretical perspective, our uncertainty quantification methods improve on prior literature by (i) covering a large class of causal predictands in staggered adoption settings, (ii) allowing for synthetic control methods with possibly nonlinear constraints, (iii) proposing scalable robust conic optimization methods and principled data-driven tuning parameter selection, and (iv) offering valid uniform inference across post-treatment periods. We illustrate our methodology with an empirical application studying the effects of economic liberalization in the 1990s on GDP for emerging European countries. Companion general-purpose software packages are provided in Python, R and Stata

arXiv.org e-Print Archive

Metabolome response to temperature-induced virulence gene expression in two genotypes of pathogenic Vibrio parahaemolyticus

Author: Bo Feng
Weijia Zhang
Yingjie Pan
Yong Zhao
Zhuoran Guo
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Relative concentration of metabolites identified in Vibrio parahaemolyticus ATCC17802. (XLSX 113 kb)

Springer - Publisher Connector

The Francis Crick Institute