166 research outputs found

    Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data

    Get PDF
    BACKGROUND: Gene Ontology (GO) terms are often used to assess the results of microarray experiments. The most common way to do this is to perform Fisher's exact tests to find GO terms which are over-represented amongst the genes declared to be differentially expressed in the analysis of the microarray experiment. However, due to the high degree of dependence between GO terms, statistical testing is conservative, and interpretation is difficult. RESULTS: We propose testing groups of GO terms rather than individual terms, to increase statistical power, reduce dependence between tests and improve the interpretation of results. We use the publicly available package POSOC to group the terms. Our method finds groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms. CONCLUSION: Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data

    A new statistic for picking out Non-Gaussianity in the CMB

    Get PDF
    In this paper we propose a new statistic capable of detecting non-Gaussianity in the CMB. The statistic is defined in Fourier space, and therefore naturally separates angular scales. It consists of taking another Fourier transform, in angle, over the Fourier modes within a given ring of scales. Like other Fourier space statistics, our statistic outdoes more conventional methods when faced with combinations of Gaussian processes (be they noise or signal) and a non-Gaussian signal which dominates only on some scales. However, unlike previous efforts along these lines, our statistic is successful in recognizing multiple non-Gaussian patterns in a single field. We discuss various applications, in which the Gaussian component may be noise or primordial signal, and the non-Gaussian component may be a cosmic string map, or some geometrical construction mimicking, say, small scale dust maps.Comment: 8 pages, 14 figures Corrected typo

    Structured Bayesian variable selection for multiple correlated response variables and high-dimensional predictors

    Full text link
    It is becoming increasingly common to study complex associations between multiple phenotypes and high-dimensional genomic features in biomedicine. However, it requires flexible and efficient joint statistical models if there are correlations between multiple response variables and between high-dimensional predictors. We propose a structured multivariate Bayesian variable selection model to identify sparse predictors associated with multiple correlated response variables. The approach makes use of known structure information between the multiple response variables and high-dimensional predictors via a Markov random field (MRF) prior for the latent indicator variables of the coefficient matrix of a sparse seemingly unrelated regressions (SSUR). The structure information included in the MRF prior can improve the model performance (i.e., variable selection and response prediction) compared to other common priors. In addition, we employ random effects to capture heterogeneity of grouped samples. The proposed approach is validated by simulation studies and applied to a pharmacogenomic study which includes pharmacological profiling and multi-omics data (i.e., gene expression, copy number variation and mutation) from in vitro anti-cancer drug sensitivity screening

    Bayesian modeling of differential gene expression.

    Get PDF
    We present a Bayesian hierarchical model for detecting differentially expressing genes that includes simultaneous estimation of array effects, and show how to use the output for choosing lists of genes for further investigation. We give empirical evidence that expression-level dependent array effects are needed, and explore different nonlinear functions as part of our model-based approach to normalization. The model includes gene-specific variances but imposes some necessary shrinkage through a hierarchical structure. Model criticism via posterior predictive checks is discussed. Modeling the array effects (normalization) simultaneously with differential expression gives fewer false positive results. To choose a list of genes, we propose to combine various criteria (for instance, fold change and overall expression) into a single indicator variable for each gene. The posterior distribution of these variables is used to pick the list of genes, thereby taking into account uncertainty in parameter estimates. In an application to mouse knockout data, Gene Ontology annotations over- and underrepresented among the genes on the chosen list are consistent with biological expectations

    Bayesian modeling of differential gene expression.

    Get PDF
    We present a Bayesian hierarchical model for detecting differentially expressing genes that includes simultaneous estimation of array effects, and show how to use the output for choosing lists of genes for further investigation. We give empirical evidence that expression-level dependent array effects are needed, and explore different nonlinear functions as part of our model-based approach to normalization. The model includes gene-specific variances but imposes some necessary shrinkage through a hierarchical structure. Model criticism via posterior predictive checks is discussed. Modeling the array effects (normalization) simultaneously with differential expression gives fewer false positive results. To choose a list of genes, we propose to combine various criteria (for instance, fold change and overall expression) into a single indicator variable for each gene. The posterior distribution of these variables is used to pick the list of genes, thereby taking into account uncertainty in parameter estimates. In an application to mouse knockout data, Gene Ontology annotations over- and underrepresented among the genes on the chosen list are consistent with biological expectations

    On the K-theory of twisted higher-rank-graph C*-algebras

    Get PDF
    We investigate the K-theory of twisted higher-rank-graph algebras by adapting parts of Elliott's computation of the K-theory of the rotation algebras. We show that each 2-cocycle on a higher-rank graph taking values in an abelian group determines a continuous bundle of twisted higher-rank graph algebras over the dual group. We use this to show that for a circle-valued 2-cocycle on a higher-rank graph obtained by exponentiating a real-valued cocycle, the K-theory of the twisted higher-rank graph algebra coincides with that of the untwisted one.Comment: 15 pages; four diagrams prepared in Tik

    Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads

    Get PDF
    We present a novel pipeline and methodology for simultaneously estimating isoform expression and allelic imbalance in diploid organisms using RNA-seq data. We achieve this by modeling the expression of haplotype-specific isoforms. If unknown, the two parental isoform sequences can be individually reconstructed. A new statistical method, MMSEQ, deconvolves the mapping of reads to multiple transcripts (isoforms or haplotype-specific isoforms). Our software can take into account non-uniform read generation and works with paired-end reads

    Free serum haemoglobin is associated with brain atrophy in secondary progressive multiple sclerosis.

    Get PDF
    Background A major cause of disability in secondary progressive multiple sclerosis (SPMS) is progressive brain atrophy, whose pathogenesis is not fully understood. The objective of this study was to identify protein biomarkers of brain atrophy in SPMS. Methods We used surface-enhanced laser desorption-ionization time-of-flight mass spectrometry to carry out an unbiased search for serum proteins whose concentration correlated with the rate of brain atrophy, measured by serial MRI scans over a 2-year period in a well-characterized cohort of 140 patients with SPMS. Protein species were identified by liquid chromatography-electrospray ionization tandem mass spectrometry. Results There was a significant (p<0.004) correlation between the rate of brain atrophy and a rise in the concentration of proteins at 15.1 kDa and 15.9 kDa in the serum. Tandem mass spectrometry identified these proteins as alpha-haemoglobin and beta-haemoglobin, respectively.  The abnormal concentration of free serum haemoglobin was confirmed by ELISA (p<0.001). The serum lactate dehydrogenase activity was also highly significantly raised (p<10-12) in patients with secondary progressive multiple sclerosis. Conclusions An underlying low-grade chronic intravascular haemolysis is a potential source of the iron whose deposition along blood vessels in multiple sclerosis plaques contributes to the neurodegeneration and consequent brain atrophy seen in progressive disease. Chelators of free serum iron will be ineffective in preventing this neurodegeneration, because the iron (Fe2+) is chelated by haemoglobin

    MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues.

    Get PDF
    MOTIVATION: Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ': hotspots ': , important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether eQTLs are common to several tissues, cell-types or, more generally, conditions or whether they are specific to a particular condition. RESULTS: We have implemented MT-HESS, a Bayesian hierarchical model that analyses the association between a large set of predictors, e.g. SNPs, and many responses, e.g. gene expression, in multiple tissues, cells or conditions. Our Bayesian sparse regression algorithm goes beyond ': one-at-a-time ': association tests between SNPs and responses and uses a fully multivariate model search across all linear combinations of SNPs, coupled with a model of the correlation between condition/tissue-specific responses. In addition, we use a hierarchical structure to leverage shared information across different genes, thus improving the detection of hotspots. We show the increase of power resulting from our new approach in an extensive simulation study. Our analysis of two case studies highlights new hotspots that would remain undetected by standard approaches and shows how greater prediction power can be achieved when several tissues are jointly considered. AVAILABILITY AND IMPLEMENTATION: C[Formula: see text] source code and documentation including compilation instructions are available under GNU licence at http://www.mrc-bsu.cam.ac.uk/software/

    A computationally efficient Bayesian seemingly unrelated regressions model for high‐dimensional quantitative trait loci discovery

    Get PDF
    Funder: Victorian Government’s Operational Infrastructure Support ProgramAbstract: Our work is motivated by the search for metabolite quantitative trait loci (QTL) in a cohort of more than 5000 people. There are 158 metabolites measured by NMR spectroscopy in the 31‐year follow‐up of the Northern Finland Birth Cohort 1966 (NFBC66). These metabolites, as with many multivariate phenotypes produced by high‐throughput biomarker technology, exhibit strong correlation structures. Existing approaches for combining such data with genetic variants for multivariate QTL analysis generally ignore phenotypic correlations or make restrictive assumptions about the associations between phenotypes and genetic loci. We present a computationally efficient Bayesian seemingly unrelated regressions model for high‐dimensional data, with cell‐sparse variable selection and sparse graphical structure for covariance selection. Cell sparsity allows different phenotype responses to be associated with different genetic predictors and the graphical structure is used to represent the conditional dependencies between phenotype variables. To achieve feasible computation of the large model space, we exploit a factorisation of the covariance matrix. Applying the model to the NFBC66 data with 9000 directly genotyped single nucleotide polymorphisms, we are able to simultaneously estimate genotype–phenotype associations and the residual dependence structure among the metabolites. The R package BayesSUR with full documentation is available at https://cran.r‐project.org/web/packages/BayesSUR
    corecore