3,135 research outputs found

    An asymptotic theory for model selection inference in general semiparametric problems.

    Get PDF
    Recently, Hjort and Claeskens (2003) developed an asymptotic theory for model selection, model averaging and post-model selection/averaging inference using likelihood methods in parametric models, along with associated confidence statements. In this paper, we consider a semiparametric version of this problem, wherein the likelihood depends on parameters and an unknown function, and model selection/averaging is to be applied to the parametric parts of the model. We show that all the results of Hjort and Claeskens hold in the semiparametric context, if the Fisher information matrix for parametric models is replaced by the semiparametric information bound for semiparametric models, and if maximum likelihood estimators for parametric models are replaced by semiparametric efficient profile estimators. The results also describe the behavior of semiparametric model estimates when the parametric component is misspecified, and have implications as well for pointwise consistent model selectors.Aikake information criterion; Bayse information criterion; Behavior; Efficient semi-parametric estimation; Estimator; Frequentist model averaging; Implications; Information; Matrix; Maximum likelihood; Methods; Model; Model averaging; Model selection; Models; Problems; Profile likelihood; Research; Selection; Semiparametric model; Theory;

    Testing Hardy-Weinberg equilibrium with a simple root-mean-square statistic

    Full text link
    We provide evidence that, in certain circumstances, a root-mean-square test of goodness of fit can be significantly more powerful than state-of-the-art tests in detecting deviations from Hardy-Weinberg equilibrium. Unlike Pearson's χ2 test, the log-likelihood-ratio test, and Fisher's exact test, which are sensitive to relative discrepancies between genotypic frequencies, the root-mean-square test is sensitive to absolute discrepancies. This can increase statistical power, as we demonstrate using benchmark data sets and simulations, and through asymptotic analysis. © 2013 The Author 2013. Published by Oxford University Press

    Bounded Influence Regression in the Presence of Heteroskedasticity of Unknown Form

    Get PDF
    In a regression model with conditional heteroskedasticity of unknown form, we propose a general class of M-estimators scaled by nonparametric estimates of the conditional standard deviations of the dependent variable. We give regularity conditions under which these estimators are asymptotically equivalent to M-estimators scaled by the true conditional standard deviations. The practical performance of these estimators is investigated through a Monte Carlo experiment

    A simultaneous confidence band for sparse longitudinal regression

    Full text link
    Functional data analysis has received considerable recent attention and a number of successful applications have been reported. In this paper, asymptotically simultaneous confidence bands are obtained for the mean function of the functional regression model, using piecewise constant spline estimation. Simulation experiments corroborate the asymptotic theory. The confidence band procedure is illustrated by analyzing CD4 cell counts of HIV infected patients

    Model averaging based on Kullback-Leibler distance

    Full text link
    © 2015, Institute of Statistical Science. All rights reserved. This paper proposes a model averaging method based on Kullback-Leibler distance under a homoscedastic normal error term. The resulting model average estimator is proved to be asymptotically optimal. When combining least squares estimators, the model average estimator is shown to have the same large sample properties as the Mallows model average (MMA) estimator developed by Hansen (2007). We show via simulations that, in terms of mean squared prediction error and mean squared parameter estimation error, the proposed model average estimator is more efficient than the MMA estimator and the estimator based on model selection using the corrected Akaike information criterion in small sample situations. A modified version of the new model average estimator is further suggested for the case of heteroscedastic random errors. The method is applied to a data set from the Hong Kong real estate market

    Semiparametric Bayesian analysis of gene-environment interactions with error in measurement of environmental covariates and missing genetic data

    Full text link
    Case-control studies are widely used to detect geneenvironment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development

    Spatial regression with covariate measurement error: A semiparametric approach

    Get PDF
    © 2016, The International Biometric Society. Spatial data have become increasingly common in epidemiology and public health research thanks to advances in GIS (Geographic Information Systems) technology. In health research, for example, it is common for epidemiologists to incorporate geographically indexed data into their studies. In practice, however, the spatially defined covariates are often measured with error. Naive estimators of regression coefficients are attenuated if measurement error is ignored. Moreover, the classical measurement error theory is inapplicable in the context of spatial modeling because of the presence of spatial correlation among the observations. We propose a semiparametric regression approach to obtain bias-corrected estimates of regression parameters and derive their large sample properties. We evaluate the performance of the proposed method through simulation studies and illustrate using data on Ischemic Heart Disease (IHD). Both simulation and practical application demonstrate that the proposed method can be effective in practice

    Rapid publication-ready MS-Word tables for one-way ANOVA

    Get PDF
    © 2014, Assaad et al.; licensee Springer. Conclusions: Our new and user-friendly software to perform statistical analysis and generate publication-ready MS-Word tables for one-way ANOVA are expected to facilitate research in agriculture, biomedicine, and other fields of life sciences.Background: Statistical tables are an important component of data analysis and reports in biological sciences. However, the traditional manual processes for computation and presentation of statistically significant results using a letter-based algorithm are tedious and prone to errors.Results: Based on the R language, we present two web-based software for individual and summary data, freely available online, at http://shiny.stat.tamu.edu:3838/hassaad/Table_report1/ and http://shiny.stat.tamu.edu:3838/hassaad/SumAOV1/, respectively. The software are capable of rapidly generating publication-ready tables containing one-way analysis of variance (ANOVA) results. No download is required. Additionally, the software can perform multiple comparisons of means using the Duncan, Student-Newman-Keuls, Tukey Kramer, and Fisher’s least significant difference (LSD) tests. If the LSD test is selected, multiple methods (e.g., Bonferroni and Holm) are available for adjusting p-values. Using the software, the procedures of ANOVA can be completed within seconds using a web-browser, preferably Mozilla Firefox or Google Chrome, and a few mouse clicks. Furthermore, the software can handle one-way ANOVA for summary data (i.e. sample size, mean, and SD or SEM per treatment group) with post-hoc multiple comparisons among treatment means. To our awareness, none of the currently available commercial (e.g., SPSS and SAS) or open-source software (e.g., R and Python) can perform such a rapid task without advanced knowledge of the corresponding programming language

    Variogram estimation in the presence of trend

    Full text link
    Estimation of covariance function parameters of the error process in the presence of an unknown smooth trend is an important problem because solving it allows one to estimate the trend nonparametrically using a smoother corrected for dependence in the errors. Our work is motivated by spatial statistics but is applicable to other contexts where the dimension of the index set can exceed one. We obtain an estimator of the covariance function parameters by regressing squared differences of the response on their expectations, which equal the variogram plus an offset term induced by the trend. Existing estimators that ignore the trend produce bias in the estimates of the variogram parameters, which our procedure corrects for. Our estimator can be justified asymptotically under the increasing domain framework. Simulation studies suggest that our estimator compares favorably with those in the current literature while making less restrictive assumptions. We use our method to estimate the variogram parameters of the short-range spatial process in a U.S. precipitation data set

    Two wrongs make a right: Addressing underreporting in binary data from multiple sources

    Full text link
    © The Author(s) 2017. Media-based event data-i.e., data comprised from reporting by media outlets-are widely used in political science research. However, events of interest (e.g., strikes, protests, conflict) are often underreportedby these primary and secondary sources, producing incomplete data that risks inconsistency and bias in subsequent analysis. While general strategies exist to help ameliorate this bias, these methods do not make full use of the information often available to researchers. Specifically, much of the event data used in the social sciences is drawn from multiple, overlapping news sources (e.g., Agence France-Presse, Reuters). Therefore, we propose a novel maximum likelihood estimator that corrects for misclassification in data arising from multiple sources. In the most general formulation of our estimator, researchers can specify separate sets of predictors for the true-event model and each of the misclassification models characterizing whether a source fails to report on an event. As such, researchers are able to accurately test theories on both the causes of and reporting on an event of interest. Simulations evidence that our technique regularly outperforms current strategies that either neglect misclassification, the unique features of the data-generating process, or both. We also illustrate the utility of this method with a model of repression using the Social Conflict in Africa Database
    corecore