93 research outputs found

    New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes

    Get PDF
    It is usual to rely on the quasi-likelihood methods for deriving statistical methods applied to clustered multinomial data with no underlying distribution. Even though extensive literature can be encountered for these kind of data sets, there are few investigations to deal with unequal cluster sizes. This paper aims to contribute to fill this gap by proposing new estimators for the intracluster correlation coefficient

    Adjusted variance components for unbalanced clustered binary data models

    Get PDF
    In practice, it is very common to have clustered binary responses, where binary data are naturally grouped by sampling technique or some property of the sampling units. Often these clusters are unbalanced. The preferred class of models for clustered binary data is the Hierarchical Generalized Linear Model (HGLM), where random effects are used to account for the overdispersion known to exist for clustered binary data. There are many methods to estimate the parameters in Hierarchical Generalized Linear Models, but none of the current methods allowed the overdispersion to vary from cluster to cluster. As clustered binary data led to overdispersion, it was reasonable to conclude that unbalanced clustered binary data may have been different overdispersion for different cluster sizes. By ignoring possible changes in overdispersion across clusters, test statistics tended to show innfatedType I error rates. In this research, two HGLM methods were adjusted to account for different overdispersion across different cluster sizes. The first new method was the Extended Restricted Pseudo Likelihood (EREPL), an adjustment of Restricted Pseudo Likelihood. Extended Restricted Pseudo Likelihood allowed for different dispersion adjustments for each cluster. The new second method was Adjusted Scale Binomial Beta (ASBB), an extension of the classical Binomial Beta model. This method allowed the Beta distributed random effect to have different scale parameters for each cluster. Through simulation, these extensions were compared to the original methods in terms of power, Type I error rate, and estimator standard errors. Adjusted Scale Binomial Beta h-likelihood was comparable to existing methods, as it gave us a low standard error and acceptable Type I error. Moreover, Binomial Beta h-likelihood had inflated Type I error. The Restricted Pseudo Likelihood could also be applied to unbalanced clustered binary data

    Essays on the Identification and Modeling of Variance

    Get PDF
    abstract: In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, the adjusted generalized quasi-likelihood model shows improved performance for random effect estimates. In addition, submodels to address deviation in skewness and kurtosis are provided to jointly model the mean, variance, skewness, and kurtosis. The additional models identify covariates influencing the third and fourth moments. A cutoff to trim the data is provided which improves parameter estimation and model fit. For each topic, findings are demonstrated through comprehensive simulation studies and numerical examples. Examples evaluated include data on children’s morbidity in the Philippines, adolescent health from the National Longitudinal Study of Adolescent to Adult Health, as well as proteomic assays for breast cancer screening.Dissertation/ThesisDoctoral Dissertation Statistics 201

    Normalization and microbial differential abundance strategies depend upon data characteristics

    Get PDF
    BackgroundData from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses.ResultsEffects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size. Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (~10Ă—) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (<20 samples per group) but tends towards a higher false discovery rate with more samples, very uneven (~10Ă—) library sizes, and/or compositional effects. For drawing inferences regarding taxon abundance in the ecosystem, analysis of composition of microbiomes (ANCOM) is not only very sensitive (for >20 samples per group) but also critically the only method tested that has a good control of false discovery rate.ConclusionsThese findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study

    Análisis inferencial basado en medidas de Fi-divergencia para modelos loglineales con muestreo Multinomial y sobredispersión

    Get PDF
    En los últimos años se han incrementado de forma importante los métodos estadísticospara analizar datos cualitativos. Quizá esto se ha debido en parte a la gran demanda, porparte de las Ciencias Biomédicas (particularmente en relación a estudios epidemiológicos),Sociales y del Comportamiento, de técnicas estadísticas especí cas para el tratamiento dela gran cantidad de datos cualitativos de que disponían. El desarrollo de técnicas especí- cas para el tratamiento de datos cualitativos o categóricos ha permitido descartar, porinnecesarios y muchas veces inapropiadas, muchas de las técnicas para variables continuasque se venían utilizando para este tipo de datos..
    • …
    corecore