218 research outputs found

    Novel specification tests for synchronous additive concurrent model formulation based on martingale difference divergence

    Get PDF
    This paper presents new specification tests for a general synchronous additive concurrent model formulation. As a novelty, our proposal does not require a preliminary model or error structure estimation. No tuning parameters are involved either. We develop a suitable test statistic using the martingale difference divergence coefficient. As a result, this statistic measures the departure from the conditional mean independence in the concurrent model framework, considering the information of all observed time instants. In particular, global as well as partial dependence tests are introduced. Then, we allow one to quantify the effect of a group of covariates or to apply covariates selection one by one. We obtain its asymptotic distribution under the null and propose a bootstrap algorithm to compute the p-values in practice. Through simulations, we illustrate our method, and its performance is compared to existing competitors. In addition, we use this in the analysis of three real datasets related to gait data, flu activity, and casual bike rentalsThe research of Laura Freijeiro-González is supported by the Consellería de Cultura, Educación e Ordenación Universitaria along with the Consellería de Economía, Emprego e Industria of the Xunta de Galicia (project ED481A-2018/264). Laura Freijeiro-González, Wenceslao González-Manteiga and Manuel Febrero-Bande acknowledged the support from Project PID2020-116587GB-I00 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe” and the Competitive Reference Groups 2021-2024 (ED431C 2021/24) from the Xunta de Galicia through the ERDF. We also acknowledge the Centro de Supercomputación de Galicia (CESGA) for computational resources. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureS

    New covariates selection approaches in high dimensional or functional regression models

    Get PDF
    In a Big Data context, the number of covariates used to explain a variable of interest, p, is likely to be high, sometimes even higher than the available sample size (p > n). Ordinary procedures for fitting regression models start to perform wrongly in this situation. As a result, other approaches are needed. A first covariates selection step is of interest to consider only the relevant terms and to reduce the problem dimensionality. The purpose of this thesis is the study and development of covariates selection techniques for regression models in complex settings. In particular, we focus on recent high dimensional or functional data contexts of interest. Assuming some model structure, regularization techniques are widely employed alternatives for both: model estimation and covariates selection simultaneously. Specifically, an extensive and critical review of penalization techniques for covariates selection is carried out. This is developed in the context of the high dimensional linear model of the vectorial framework. Conversely, if no model structure wants to be assumed, stateof- the-art dependence measures based on distances are an attractive option for covariates selection. New specification tests using these ideas are proposed for the functional concurrent model. Both versions are considered separately: the synchronous and the asynchronous case. These approaches are based on novel dependence measures derived from the distance covariance coefficient

    Test and Measure for Partial Mean Dependence Based on Deep Neural Networks

    Full text link
    It is of great importance to investigate the significance of a subset of covariates W for the response Y given covariates Z in regression modeling. To this end, we propose a new significance test for the partial mean independence problem based on deep neural networks and data splitting. The test statistic converges to the standard chi-squared distribution under the null hypothesis while it converges to a normal distribution under the alternative hypothesis. We also suggest a powerful ensemble algorithm based on multiple data splitting to enhance the testing power. If the null hypothesis is rejected, we propose a new partial Generalized Measure of Correlation (pGMC) to measure the partial mean dependence of Y given W after controlling for the nonlinear effect of Z, which is an interesting extension of the GMC proposed by Zheng et al. (2012). We present the appealing theoretical properties of the pGMC and establish the asymptotic normality of its estimator with the optimal root-N converge rate. Furthermore, the valid confidence interval for the pGMC is also derived. As an important special case when there is no conditional covariates Z, we also consider a new test of overall significance of covariates for the response in a model-free setting. We also introduce new estimator of GMC and derive its asymptotic normality. Numerical studies and real data analysis are also conducted to compare with existing approaches and to illustrate the validity and flexibility of our proposed procedures

    Semiparametric inference in mixture models with predictive recursion marginal likelihood

    Full text link
    Predictive recursion is an accurate and computationally efficient algorithm for nonparametric estimation of mixing densities in mixture models. In semiparametric mixture models, however, the algorithm fails to account for any uncertainty in the additional unknown structural parameter. As an alternative to existing profile likelihood methods, we treat predictive recursion as a filter approximation to fitting a fully Bayes model, whereby an approximate marginal likelihood of the structural parameter emerges and can be used for inference. We call this the predictive recursion marginal likelihood. Convergence properties of predictive recursion under model mis-specification also lead to an attractive construction of this new procedure. We show pointwise convergence of a normalized version of this marginal likelihood function. Simulations compare the performance of this new marginal likelihood approach that of existing profile likelihood methods as well as Dirichlet process mixtures in density estimation. Mixed-effects models and an empirical Bayes multiple testing application in time series analysis are also considered

    High Dimensional Data Analysis: variable screening and inference

    Get PDF
    This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods. To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors that marginally independent but conditionally dependent on the response. The second part of the dissertation focuses on high dimensional inference. The problem arise from microbiome and metabolome study. The microbial community in the human gut is teeming with metabolic activity and plays a key role in host physiology and health. But the host-microbiome interactions are not well understood in terms of the molecular mechanism, while the microbial metabolites have been hypothesized to play a critical role. This motivate us to developed a statistical framework that first quantifies the abundances of microbial metabolites and then examines the associations between such metabolites and disease outcomes. This framework also accounts for potential high-dimensional microbiome confounders, thereby avoiding potential false discoveries of disease-associated metabolites. We overcome this challenging inference problem based on the idea of debiasing lasso. In numerical study, we demonstrate its significant power improvement when comparing some popular existing methods
    corecore