12 research outputs found

    Bayesian ranking and selection methods using hierarchical mixture models in microarray studies.

    Get PDF
    The main purpose of microarray studies is screening to identify differentially expressed genes as candidates for further investigation. Because of limited resources in this stage, prioritizing or ranking genes is a relevant statistical task in microarray studies. In this article, we develop 3 empirical Bayes methods for gene ranking on the basis of differential expression, using hierarchical mixture models. These methods are based on (i) minimizing mean squared errors of estimation for parameters, (ii) minimizing mean squared errors of estimation for ranks of parameters, and (iii) maximizing sensitivity in selecting prespecified numbers of differential genes, with the largest effect. Our methods incorporate the mixture structures of differential and nondifferential components in empirical Bayes models to allow information borrowing across differential genes, with separation from nuisance, nondifferential genes. The accuracy of our ranking methods is compared with that of conventional methods through simulation studies. An application to a clinical study for breast cancer is provided

    An Empirical Bayes Optimal Discovery Procedure Based on Semiparametric Hierarchical Mixture Models

    Get PDF
    Multiple testing has been widely adopted for genome-wide studies such as microarray experiments. For effective gene selection in these genome-wide studies, the optimal discovery procedure (ODP), which maximizes the number of expected true positives for each fixed number of expected false positives, was developed as a multiple testing extension of the most powerful test for a single hypothesis by Storey (Journal of the Royal Statistical Society, Series B, vol. 69, no. 3, pp. 347–368, 2007). In this paper, we develop an empirical Bayes method for implementing the ODP based on a semiparametric hierarchical mixture model using the “smoothing-by-roughening" approach. Under the semiparametric hierarchical mixture model, (i) the prior distribution can be modeled flexibly, (ii) the ODP test statistic and the posterior distribution are analytically tractable, and (iii) computations are easy to implement. In addition, we provide a significance rule based on the false discovery rate (FDR) in the empirical Bayes framework. Applications to two clinical studies are presented

    Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data

    Get PDF
    Molecular heterogeneity of cancer, partially caused by various chromosomal aberrations or gene mutations, can yield substantial heterogeneity in gene expression profile in cancer samples. To detect cancer-related genes which are active only in a subset of cancer samples or cancer outliers, several methods have been proposed in the context of multiple testing. Such cancer outlier analyses will generally suffer from a serious lack of power, compared with the standard multiple testing setting where common activation of genes across all cancer samples is supposed. In this paper, we consider information sharing across genes and cancer samples, via a parametric normal mixture modeling of gene expression levels of cancer samples across genes after a standardization using the reference, normal sample data. A gene-based statistic for gene selection is developed on the basis of a posterior probability of cancer outlier for each cancer sample. Some efficiency improvement by using our method was demonstrated, even under settings with misspecified, heavy-tailed t-distributions. An application to a real dataset from hematologic malignancies is provided

    Dataset for: Quantifying indirect evidence in network meta-analysis

    No full text
    Network meta-analysis enables comprehensive synthesis of evidence concerning multiple treatments and their simultaneous comparisons based on both direct and indirect evidence. A fundamental pre-requisite of network meta-analysis is the consistency of evidence that is obtained from different sources, particularly whether direct and indirect evidence are in accordance with each other or not, and how they may influence the overall estimates. We have developed an efficient method to quantify indirect evidence, as well as a testing procedure to evaluate their inconsistency using Lindsay's composite likelihood method. We also show that this estimator has complete information for the indirect evidence. Using this method, we can assess the degree of consistency between direct and indirect evidence and their contribution rates to the overall estimate. Sensitivity analyses can be also conducted with this method to assess the influences of potentially inconsistent treatment contrasts on the overall results. These methods can provide useful information for overall comparative results that might be biased from specific inconsistent treatment contrasts. We also provide some fundamental requirements for valid inference on these methods concerning consistency restrictions on multi-arm trials. In addition, the efficiency of the developed method is demonstrated based on simulation studies. Applications to a network meta-analysis of 12 new-generation antidepressants are presented

    Empirical Bayes Estimation of Semi-parametric Hierarchical Mixture Models for Unbiased Characterization of Polygenic Disease Architectures

    No full text
    Genome-wide association studies (GWAS) suggest that the genetic architecture of complex diseases consists of unexpectedly numerous variants with small effect sizes. However, the polygenic architectures of many diseases have not been well characterized due to lack of simple and fast methods for unbiased estimation of the underlying proportion of disease-associated variants and their effect-size distribution. Applying empirical Bayes estimation of semi-parametric hierarchical mixture models to GWAS summary statistics, we confirmed that schizophrenia was extremely polygenic [~40% of independent genome-wide SNPs are risk variants, most within odds ratio (OR = 1.03)], whereas rheumatoid arthritis was less polygenic (~4 to 8% risk variants, significant portion reaching OR = 1.05 to 1.1). For rheumatoid arthritis, stratified estimations revealed that expression quantitative loci in blood explained large genetic variance, and low- and high-frequency derived alleles were prone to be risk and protective, respectively, suggesting a predominance of deleterious-risk and advantageous-protective mutations. Despite genetic correlation, effect-size distributions for schizophrenia and bipolar disorder differed across allele frequency. These analyses distinguished disease polygenic architectures and provided clues for etiological differences in complex diseases

    Data_Sheet_1_Empirical Bayes Estimation of Semi-parametric Hierarchical Mixture Models for Unbiased Characterization of Polygenic Disease Architectures.XLSX

    No full text
    <p>Genome-wide association studies (GWAS) suggest that the genetic architecture of complex diseases consists of unexpectedly numerous variants with small effect sizes. However, the polygenic architectures of many diseases have not been well characterized due to lack of simple and fast methods for unbiased estimation of the underlying proportion of disease-associated variants and their effect-size distribution. Applying empirical Bayes estimation of semi-parametric hierarchical mixture models to GWAS summary statistics, we confirmed that schizophrenia was extremely polygenic [~40% of independent genome-wide SNPs are risk variants, most within odds ratio (OR = 1.03)], whereas rheumatoid arthritis was less polygenic (~4 to 8% risk variants, significant portion reaching OR = 1.05 to 1.1). For rheumatoid arthritis, stratified estimations revealed that expression quantitative loci in blood explained large genetic variance, and low- and high-frequency derived alleles were prone to be risk and protective, respectively, suggesting a predominance of deleterious-risk and advantageous-protective mutations. Despite genetic correlation, effect-size distributions for schizophrenia and bipolar disorder differed across allele frequency. These analyses distinguished disease polygenic architectures and provided clues for etiological differences in complex diseases.</p

    Aquifex aeolicus tRNA (N2,N2-Guanine)-dimethyltransferase (Trm1) Catalyzes Transfer of Methyl Groups Not Only to Guanine 26 but Also to Guanine 27 in tRNA*

    No full text
    Transfer RNA (N2,N2-guanine)-dimethyltransferase (Trm1) catalyzes N2,N2-dimethylguanine formation at position 26 (m22G26) in tRNA. In the reaction, N2-guanine at position 26 (m2G26) is generated as an intermediate. The trm1 genes are found only in archaea and eukaryotes, although it has been reported that Aquifex aeolicus, a hyper-thermophilic eubacterium, has a putative trm1 gene. To confirm whether A. aeolicus Trm1 has tRNA methyltransferase activity, we purified recombinant Trm1 protein. In vitro methyl transfer assay revealed that the protein has a strong tRNA methyltransferase activity. We confirmed that this gene product is expressed in living A. aeolicus cells and that the enzymatic activity exists in cell extract. By preparing 22 tRNA transcripts and testing their methyl group acceptance activities, it was demonstrated that this Trm1 protein has a novel tRNA specificity. Mass spectrometry analysis revealed that it catalyzes methyl transfers not only to G26 but also to G27 in substrate tRNA. Furthermore, it was confirmed that native tRNACys has an m22G26m2G27 or m22G26m22G27 sequence, demonstrating that these modifications occur in living cells. Kinetic studies reveal that the m2G26 formation is faster than the m2G27 formation and that disruption of the G27-C43 base pair accelerates velocity of the G27 modification. Moreover, we prepared an additional 22 mutant tRNA transcripts and clarified that the recognition sites exist in the T-arm structure. This long distance recognition results in multisite recognition by the enzyme
    corecore