2 research outputs found
Recommended from our members
Evaluating the Impact of the Clean Heat Program on Air Pollution Levels in New York City
Residual heating oil is a class of heavy oil that remains after the lighter components are distilled away from crude oil in the refining process (EIA 2020) and has been linked to adverse health outcomes (Bell et al. 2009). In New York City (NYC), residual heating oil has been identified as a major source of multiple air pollutants, including fine particulate matter [PM less than or equal to 2.5 micrometers ≤ 2.5 μm in aerodynamic diameter (PM₂.₅)] (Clougherty et al. 2010; Kheirbek et al. 2014), sulfur dioxide (SO₂), nitrogen oxides (NOₓ) (U.S. EPA 1998), and black carbon (Cornell et al. 2012). Prior to policy implementation, three types of heating oil were used in NYC: heating oil #4, #6, and ultra-low sulfur oil #2. Both #6 and #4 are referred to as residual heating oils, and oil #2, which is the lightest of the three, has been considered a cleaner alternative (Kheirbek et al. 2014). In 2012, NYC established the Clean Heat Program (CHP) to eliminate the use of residual heating oil and move toward cleaner energy forms (Hernández 2016). Here, we have evaluated the CHP outcomes, quantified the CHP-attributable air pollution reductions between 2012 and 2016, and assessed if and how these reductions vary by neighborhood socioeconomic status (SES). We aim to contribute to the knowledge of CHP effects since its implementation, assess relevant equity issues, and inform future policy improvements
Non-asymptotic properties of spectral decomposition of large gram-type matrices with applications to high-dimensional inference
2020 Fall.Includes bibliographical references.Jointly modeling a large and possibly divergent number of temporally evolving subjects arises ubiquitously in statistics, econometrics, finance, biology, and environmental sciences. To circumvent the challenges due to the high dimesionality as well as the temporal and/or contemporaneous dependence, the factor model and its variants have been widely employed. In general, they model the large scale temporally dependent data using some low dimensional structures that capture variations shared across dimensions. In this dissertation, we investigate the non-asymptotic properties of spectral decomposition of high-dimensional Gram-type matrices based on factor models. Specifically, we derive the exponential tail bound for the first and second moments of the deviation between the empirical and population eigenvectors to the right Gram matrix as well as the Berry-Esseen type bound to characterize the Gaussian approximation of these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and related machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising from temporally dependent data. Next, we consider the estimation and inference of a flexible subject-specific heteroskedasticity model for large scale panel data, which employs latent semiparametric factor structure to simultaneously account for the heteroskedasticity across subjects and contemporaneous and/or serial correlations. Specifically, the subject-specific heteroskedasticity is modeled by the product of unobserved factor process and subject-specific covariate effect. Serving as the loading, the covariate effect is further modeled via additive models. We propose a two-step procedure for estimation. Theoretical validity of this procedure is documented. By scrupulously examining the non-asymptotic rates for recovering the latent factor process and its loading, we show the consistency and asymptotic efficiency of our regression coefficient estimator in addition to the asymptotic normality. This leads to a more efficient confidence set for the regression coefficient. Using a comprehensive simulation study, we demonstrate the finite sample performance of our procedure, and numerical results corroborate the theoretical findings. Finally, we consider the factor model-assisted variable clustering for temporally dependent data. The population level clusters are characterized by the latent factors of the model. We combine the approximate factor model with population level clusters to give an integrative group factor model as a background model for variable clustering. In this model, variables are loaded on latent factors and the factors are the same for variables from a common cluster and are different for variables from different groups. The commonality among clusters is modeled by common factors and the clustering structure is modeled by unique factors of each cluster. We quantify the difficulty of clustering data generated from integrative group factor model in terms of a permutation-invariant clustering error. We develop an algorithm to recover clustering assignments and study its minimax-optimality. The analysis of integrative group factor model and our proposed algorithm partitions a two-dimensional phase space into three regions showing the impact of parameters on the possibility of clustering in integrative group factor model and the statistical guarantee of our proposed algorithm. We also obtain the non-asymptotic characterization of the estimated number of latent factors. The model can be extended to the case of diverging number of clusters with similar results