378 research outputs found

    Semiparametric inference on the means of multiple nonnegative distributions with excess zero observations

    Get PDF
    The final publication is available at Elsevier via https://dx.doi.org/10.1016/j.jmva.2018.02.010 © 2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/A non-standard, but not uncommon, situation is to observe multiple samples of nonnegative data which have a high proportion of zeros. This is the so-called excess of zeros situation and this paper looks at the problem of making inferences about the means of the underlying distributions. Under the semiparametric setup, proposed by Wang et al. (2017), we develop a unified inference framework, based on an empirical likelihood ratio (ELR) statistic, for making inferences on the means of multiple such distributions. A chi-square-type limiting distribution of this statistic is established under a general linear null hypothesis about the means. This result allows us to construct a new test for mean equality. Simulation results show favorable performance of the proposed ELR when compared with other existing methods for testing mean equality, especially when the correctly specified basis function in the density ratio model is the logarithm function. A real data set is analyzed to illustrate the advantages of the proposed method.Natural Sciences and Engineering Research Council of Canada (Grants RGPIN-2014-05424; RGPIN-2015-06592)Fundamental Research Funds for the Central Universities (Grants 20720181043; 20720181003

    Empirical Likelihood and Bootstrap Inference with Constraints

    Get PDF
    Empirical likelihood and the bootstrap play influential roles in contemporary statistics. This thesis studies two distinct statistical inference problems, referred to as Part I and Part II, related to the empirical likelihood and bootstrap, respectively. Part I of this thesis concerns making statistical inferences on multiple groups of samples that contain excess zero observations. A unique feature of the target populations is that the distribution of each group is characterized by a non-standard mixture of a singular distribution at zero and a skewed nonnegative component. In Part I of this thesis, we propose modelling the nonnegative components using a semiparametric, multiple-sample, density ratio model (DRM). Under this semiparametric setup, we can efficiently utilize information from the combined samples even with unspecified underlying distributions. We first study the question of testing homogeneity of multiple nonnegative distributions when there is an excess of zeros in the data, under the proposed semiparametric setup. We develop a new empirical likelihood ratio (ELR) test for homogeneity and show that this ELR has a χ2\chi^2-type limiting distribution under the homogeneous null hypothesis. A nonparametric bootstrap procedure is proposed to calibrate the finite-sample distribution of the ELR. The consistency of this bootstrap procedure is established under both the null and alternative hypotheses. Simulation studies show that the bootstrap ELR test has an accurate nominal type I error, is robust to changes of underlying distributions, is competitive to, and sometimes more powerful than, several popular one- and two-part tests. A real data example is used to illustrate the advantages of the proposed test. We next investigate the problem of comparing the means of multiple nonnegative distributions, with excess zero observations, under the proposed semiparametric setup. We develop a unified inference framework based on our new ELR statistic, and show that this ELR has a χ2\chi^2-type limiting distribution under a general null hypothesis. This allows us to construct a new test for mean equality. Simulation results show favourable performance of the proposed ELR test compared with other existing tests for mean equality, especially when the correctly specified basis function in the DRM is the logarithm function. A real data set is analyzed to illustrate the advantages of the proposed method. In Part II of this thesis, we investigate the asymptotic behaviour of, the commonly used, bootstrap percentile confidence intervals when the parameters are subject to inequality constraints. We concentrate on the important one- and two-sample problems with data generated from distributions in the natural exponential family. Our attention is focused on quantifying asymptotic coverage probabilities of the percentile confidence intervals based on bootstrapping maximum likelihood estimators. We propose a novel local framework to study the subtle asymptotic behaviour of bootstrap percentile confidence intervals when the true parameter values are close to the boundary. Under this framework, we discover that when the true parameter is on, or close to, the restriction boundary, the local asymptotic coverage probabilities can always exceed the nominal level in the one-sample case; however, they can be, surprisingly, both under and over the nominal level in the two-sample case. The results provide theoretical justification and guidance on applying the bootstrap percentile method to constrained inference problems. The two individual parts of this thesis are connected by being referred to as {\em constrained statistical inference}. Specifically, in Part I, the semiparametric density ratio model uses an exponential tilting constraint, which is a type of equality constraint, on the parameter space. In Part II, we deal with inequality constraints, such as a boundary or ordering constraints, on the parameter space. For both parts, an important regularity condition in traditional likelihood inference, that parameters should be interior points of the parameter space, is violated. Therefore, the respective inference procedures involve non-standard asymptotics that create new technical challenges

    Bayesian semiparametric stochastic volatility modeling

    Get PDF
    This paper extends the existing fully parametric Bayesian literature on stochastic volatility to allow for more general return distributions. Instead of specifying a particular distribution for the return innovation, we use nonparametric Bayesian methods to flexibly model the skewness and kurtosis of the distribution while continuing to model the dynamics of volatility with a parametric structure. Our semiparametric Bayesian approach provides a full characterization of parametric and distributional uncertainty. We present a Markov chain Monte Carlo sampling approach to estimation with theoretical and computational issues for simulation from the posterior predictive distributions. The new model is assessed based on simulation evidence, an empirical example, and comparison to parametric models.Econometric models ; Stochastic analysis

    Bayesian semiparametric stochastic volatility modeling

    Get PDF
    This paper extends the existing fully parametric Bayesian literature on stochastic volatility to allow for more general return distributions. Instead of specifying a particular distribution for the return innovation, nonparametric Bayesian methods are used to flexibly model the skewness and kurtosis of the distribution while the dynamics of volatility continue to be modeled with a parametric structure. Our semiparametric Bayesian approach provides a full characterization of parametric and distributional uncertainty. A Markov chain Monte Carlo sampling approach to estimation is presented with theoretical and computational issues for simulation from the posterior predictive distributions. The new model is assessed based on simulation evidence, an empirical example, and comparison to parametric models.Dirichlet process mixture, MCMC, block sampler

    Varying coefficient GARCH versus local constant volatility modeling. Comparison of the predictive power

    Get PDF
    GARCH models are widely used in financial econometrics. However, we show by mean of a simple simulation example that the GARCH approach may lead to a serious model misspecification if the assumption of stationarity is violated. In particular, the well known integrated GARCH effect can be explained by nonstationarity of the time series. We then introduce a more general class of GARCH models with time varying coefficients and present an adaptive procedure which can estimate the GARCH coefficients as a function of time. We also discuss a simpler semiparametric model in which the beta-parameter is fixed. Finally we compare the performance of the parametric, time varying nonparametric and semiparametric GARCH(1,1) models and the locally constant model from Polzehl and Spokoiny (2002) by means of simulated and real data sets using different forecasting criteria. Our results indicate that the simple locally constant model outperforms the other models in almost all cases. The GARCH(1,1) model also demonstrates a relatively good forecasting performance as far as the short term forecasting horizon is considered. However, its application to long term forecasting seems questionable because of possible misspecification of the model parameters.varying coefficient GARCH, adaptive weights

    Gaussian Semiparametric Estimation in Long Memory in Stochastic Volatility and Signal Plus Noise Models

    Get PDF
    This paper considers the persistence found in the volatility of many financial time series by means of a local Long Memory in Stochastic Volatility model and analyzes the performance of the Gaussian semiparametric or local Whittle estimator of the memory parameter in a long memory signal plus noise model which includes the Long Memory in Stochastic Volatility as a particular case. It is proved that this estimate preserves the consistency and asymptotic normality encountered in observable long memory series and under milder conditions it is more efficient than the estimator based on a log-periodogram regression. Although the asymptotic properties do not depend on the signal-to-noise ratio the finite sample performance rely upon this magnitude and an appropriate choice of the bandwidth is important to minimize the influence of the added noise. I analyze the effect of the bandwidth via Monte Carlo. An application to a Spanish stock index is finally included.long memory, stochastic volatility, semiparametric estimation, frequency domain

    Robust Bayesian Tensor Factorization with Zero-Inflated Poisson Model and Consensus Aggregation

    Full text link
    Tensor factorizations (TF) are powerful tools for the efficient representation and analysis of multidimensional data. However, classic TF methods based on maximum likelihood estimation underperform when applied to zero-inflated count data, such as single-cell RNA sequencing (scRNA-seq) data. Additionally, the stochasticity inherent in TFs results in factors that vary across repeated runs, making interpretation and reproducibility of the results challenging. In this paper, we introduce Zero Inflated Poisson Tensor Factorization (ZIPTF), a novel approach for the factorization of high-dimensional count data with excess zeros. To address the challenge of stochasticity, we introduce Consensus Zero Inflated Poisson Tensor Factorization (C-ZIPTF), which combines ZIPTF with a consensus-based meta-analysis. We evaluate our proposed ZIPTF and C-ZIPTF on synthetic zero-inflated count data and synthetic and real scRNA-seq data. ZIPTF consistently outperforms baseline matrix and tensor factorization methods in terms of reconstruction accuracy for zero-inflated data. When the probability of excess zeros is high, ZIPTF achieves up to 2.4×2.4\times better accuracy. Additionally, C-ZIPTF significantly improves the consistency and accuracy of the factorization. When tested on both synthetic and real scRNA-seq data, ZIPTF and C-ZIPTF consistently recover known and biologically meaningful gene expression programs
    corecore