31 research outputs found
The poolr Package for Combining Independent and Dependent p Values
The poolr package provides an implementation of a variety of methods for pooling (i.e., combining) p values, including Fisher's method, Stouffer's method, the inverse chisquare method, the binomial test, the Bonferroni method, and Tippett's method. More importantly, the methods can be adjusted to account for dependence among the tests from which the p values have been derived assuming multivariate normality among the test statistics. All methods can be adjusted based on an estimate of the effective number of tests or by using an empirically-derived null distribution based on pseudo replicates that mimics a proper permutation test. For the Fisher, Stouffer, and inverse chi-square methods, the test statistics can also be directly generalized to account for dependence, leading to Brown's method, Strube's method, and the generalized inverse chi-square method. In this paper, we describe the various methods, discuss their implementation in the package, illustrate their use based on several examples, and compare the poolr package with several other packages that can be used to combine p values
Gene-environment interaction study on the polygenic risk score for neuroticism, childhood adversity, and parental bonding
The present study examines whether neuroticism is predicted by genetic vulnerability, summarized as polygenic risk score for neuroticism (PRSN), in interaction with bullying, parental bonding, and childhood adversity. Data were derived from a general population adolescent and young adult twin cohort. The final sample consisted of 202 monozygotic and 436 dizygotic twins and 319 twin pairs. The Short Eysenck Personality questionnaire was used to measure neuroticism. PRSN was trained on the results from the Genetics of Personality Consortium (GPC) and United Kingdom Biobank (UKB) cohorts, yielding two different PRSN. Multilevel mixed-effects models were used to analyze the main and interacting associations of PRSN, childhood adversity, bullying, and parental bonding style with neuroticism. We found no evidence of gene-environment correlation. PRSN thresholds of.005 and.2 were chosen, based on GPC and UKB datasets, respectively. After correction for confounders, all the individual variables were associated with the expression of neuroticism: both PRSN from GPC and UKB, childhood adversity, maternal bonding, paternal bonding, and bullying in primary school and secondary school. However, the results indicated no evidence for gene-environment interaction in this cohort. These results suggest that genetic vulnerability on the one hand and negative life events (childhood adversity and bullying) and positive life events (optimal parental bonding) on the other represent noninteracting pathways to neuroticism
Gene–environment interaction study on the polygenic risk score for neuroticism, childhood adversity, and parental bonding
The present study examines whether neuroticism is predicted by genetic vulnerability, summarized as polygenic risk score for neuroticism (PRSN), in interaction with bullying, parental bonding, and childhood adversity. Data were derived from a general population adolescent and young adult twin cohort. The final sample consisted of 202 monozygotic and 436 dizygotic twins and 319 twin pairs. The Short Eysenck Personality questionnaire was used to measure neuroticism. PRSN was trained on the results from the Genetics of Personality Consortium (GPC) and United Kingdom Biobank (UKB) cohorts, yielding two different PRSN. Multilevel mixed-effects models were used to analyze the main and interacting associations of PRSN, childhood adversity, bullying, and parental bonding style with neuroticism. We found no evidence of gene–environment correlation. PRSN thresholds of .005 and .2 were chosen, based on GPC and UKB datasets, respectively. After correction for confounders, all the individual variables were associated with the expression of neuroticism: both PRSN from GPC and UKB, childhood adversity, maternal bonding, paternal bonding, and bullying in primary school and secondary school. However, the results indicated no evidence for gene–environment interaction in this cohort. These results suggest that genetic vulnerability on the one hand and negative life events (childhood adversity and bullying) and positive life events (optimal parental bonding) on the other represent noninteracting pathways to neuroticism
Combining information: model selection in meta-analysis and methods for combining correlated p-values
Statistical analyses always lead to results that include an imprecision to a degree. Combining information refers to quantitative methods that can synthesize information from multiple sources on the same topic to reach a more precise and general result. A challenging issue in combining information is that the individual results may be correlated to each other. For example, in ecology, outcomes from multiple species are dependent on each other due to their shared evolutionary history. This dissertation focuses on two fields in combining information, meta-analysis and combining p-values, and examines techniques that can incorporate the dependence among the correlated measurements into the method for combining information. The results show that, by doing so, it is possible to reduce the risk of incorrect results, such as false positives. Furthermore, this dissertation introduces an open-source software that implements methods for combining p-values and adjustment techniques to incorporate the correlations among them
A Comparison of Methods for Gene-Based Testing That Account for Linkage Disequilibrium
Controlling the type I error rate while retaining sufficient power is a major concern in genome-wide association studies, which nowadays often examine more than a million single-nucleotide polymorphisms (SNPs) simultaneously. Methods such as the Bonferroni correction can lead to a considerable decrease in power due to the large number of tests conducted. Shifting the focus to higher functional structures (e.g., genes) can reduce the loss of power. This can be accomplished via the combination of p-values of SNPs that belong to the same structural unit to test their joint null hypothesis. However, standard methods for this purpose (e.g., Fisher's method) do not account for the dependence among the tests due to linkage disequilibrium (LD). In this paper, we review various adjustments to methods for combining p-values that take LD information explicitly into consideration and evaluate their performance in a simulation study based on data from the HapMap project. The results illustrate the importance of incorporating LD information into the methods for controlling the type I error rate at the desired level. Furthermore, some methods are more successful in controlling the type I error rate than others. Among them, Brown's method was the most robust technique with respect to the characteristics of the genes and outperformed the Bonferroni method in terms of power in many scenarios. Examining the genetic factors of a phenotype of interest at the gene-rather than SNP-level can provide researchers benefits in terms of the power of the study. While doing so, one should be careful to account for LD in SNPs belonging to the same gene, for which Brown's method seems the most robust technique
An evaluation of a novel approach for clustering genes with dissimilar replicates
Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.</p
Clustering of short time-course gene expression data with dissimilar replicates
Microarrays are used in genetics and medicine to examine large numbers of genes simultaneously through their expression levels under any condition such as a disease of interest. The information from these experiments can be enriched by following the expression levels through time and biological replicates. The purpose of this study is to propose an algorithm which clusters the genes with respect to the similarities between their behaviors through time. The algorithm is also aimed at highlighting the genes which show different behaviors between the replicates and separating the constant genes that keep their baseline expression levels throughout the study. Finally, we aim to feature cluster validation techniques to suggest a sensible number of clusters when it is not known a priori. The illustrations show that the proposed algorithm in this study offers a fast approach to clustering the genes with respect to their behavior similarities, and also separates the constant genes and the genes with dissimilar replicates without any need for pre-processing. Moreover, it is also successful at suggesting the correct number of clusters when that is not known
Phylogenetic multilevel meta-analysis: A simulation study on the importance of modeling the phylogeny
The data files that store the results of the manuscript and the programming files to reproduce the results
An evaluation of a novel approach for clustering genes with dissimilar replicates
Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly