92 research outputs found
Augmented two-step estimating equations with nuisance functionals and complex survey data
Statistical inference in the presence of nuisance functionals with complex
survey data is an important topic in social and economic studies. The Gini
index, Lorenz curves and quantile shares are among the commonly encountered
examples. The nuisance functionals are usually handled by a plug-in
nonparametric estimator and the main inferential procedure can be carried out
through a two-step generalized empirical likelihood method. Unfortunately, the
resulting inference is not efficient and the nonparametric version of the
Wilks' theorem breaks down even under simple random sampling. We propose an
augmented estimating equations method with nuisance functionals and complex
surveys. The second-step augmented estimating functions obey the Neyman
orthogonality condition and automatically handle the impact of the first-step
plug-in estimator, and the resulting estimator of the main parameters of
interest is invariant to the first step method. More importantly, the
generalized empirical likelihood based Wilks' theorem holds for the main
parameters of interest under the design-based framework for commonly used
survey designs, and the maximum generalized empirical likelihood estimators
achieve the semiparametric efficiency bound. Performances of the proposed
methods are demonstrated through simulation studies and an application using
the dataset from the New York City Social Indicators Survey.Comment: 43 page
Pseudo-Empirical Likelihood Methods for Causal Inference
Causal inference problems have remained an important research topic over the
past several decades due to their general applicability in assessing a
treatment effect in many different real-world settings. In this paper, we
propose two inferential procedures on the average treatment effect (ATE)
through a two-sample pseudo-empirical likelihood (PEL) approach. The first
procedure uses the estimated propensity scores for the formulation of the PEL
function, and the resulting maximum PEL estimator of the ATE is equivalent to
the inverse probability weighted estimator discussed in the literature. Our
focus in this scenario is on the PEL ratio statistic and establishing its
theoretical properties. The second procedure incorporates outcome regression
models for PEL inference through model-calibration constraints, and the
resulting maximum PEL estimator of the ATE is doubly robust. Our main
theoretical result in this case is the establishment of the asymptotic
distribution of the PEL ratio statistic. We also propose a bootstrap method for
constructing PEL ratio confidence intervals for the ATE to bypass the scaling
constant which is involved in the asymptotic distribution of the PEL ratio
statistic but is very difficult to calculate. Finite sample performances of our
proposed methods with comparisons to existing ones are investigated through
simulation studies
Sparse and efficient replication variance estimation for complex surveys
It is routine practice for survey organizations to provide replication weights as part of survey data files. These replication weights are meant to produce valid and efficient variance estimates for a variety of estimators in a simple and systematic manner. Most existing methods for constructing replication weights, however, are only valid for specific sampling designs and typically require a very large number of replicates. In this paper we first show how to produce replication weights based on the method outlined in Fay (1984) such that the resulting replication variance estimator is algebraically equivalent to the fully efficient linearization variance estimator for any given sampling design. We then propose a novel weight-calibration method to simultaneously achieve efficiency and sparsity in the sense that a small number of sets of replication weights can produce valid and efficient replication variance estimators for key population parameters. Our proposed method can be used in conjunction with existing resampling techniques for large-scale complex surveys. Validity of the proposed methods and extensions to some balanced sampling designs are also discussed. Simulation results showed that our proposed variance estimators perform very well in tracking coverage probabilities of confidence intervals. Our proposed strategies will likely have impact on how public-use survey data files are produced and how these data sets are analyzed
Combining Non-probability and Probability Survey Samples Through Mass Imputation
This paper presents theoretical results on combining non-probability and
probability survey samples through mass imputation, an approach originally
proposed by Rivers (2007) as sample matching without rigorous theoretical
justification. Under suitable regularity conditions, we establish the
consistency of the mass imputation estimator and derive its asymptotic variance
formula. Variance estimators are developed using either linearization or
bootstrap. Finite sample performances of the mass imputation estimator are
investigated through simulation studies and an application to analyzing a
non-probability sample collected by the Pew Research Centre.Comment: Submitted to Journal of the Royal Statistical Society: Series
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty
In this paper, we present TrimTail, a simple but effective emission
regularization method to improve the latency of streaming ASR models. The core
idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames,
see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not
require any alignment. We demonstrate that TrimTail is computationally cheap
and can be applied online and optimized with any training loss or any model
architecture on any dataset without any extra effort by applying it on various
end-to-end streaming ASR networks either trained with CTC loss [1] or
Transducer loss [2]. We achieve 100 200ms latency reduction with equal
or even better accuracy on both Aishell-1 and Librispeech. Moreover, by using
TrimTail, we can achieve a 400ms algorithmic improvement of User Sensitive
Delay (USD) with an accuracy loss of less than 0.2.Comment: submitted to ICASSP 202
- …