25 research outputs found

    Investigating the effect of paralogs on microarray gene-set analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research.</p> <p>Results</p> <p>We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene <url>http://www.cbio.uct.ac.za/indygene</url>, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs.</p> <p>Conclusions</p> <p>The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.</p

    Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Analysis of microarray experiments often involves testing for the overrepresentation of pre-defined sets of genes among lists of genes deemed individually significant. Most popular gene set testing methods assume the independence of genes within each set, an assumption that is seriously violated, as extensive correlation between genes is a well-documented phenomenon.</p> <p>Results</p> <p>We conducted a meta-analysis of over 200 datasets from the Gene Expression Omnibus in order to demonstrate the practical impact of strong gene correlation patterns that are highly consistent across experiments. We show that a common independence assumption-based gene set testing procedure produces very high false positive rates when applied to data sets for which treatment groups have been randomized, and that gene sets with high internal correlation are more likely to be declared significant. A reanalysis of the same datasets using an array resampling approach properly controls false positive rates, leading to more parsimonious and high-confidence gene set findings, which should facilitate pathway-based interpretation of the microarray data.</p> <p>Conclusions</p> <p>These findings call into question many of the gene set testing results in the literature and argue strongly for the adoption of resampling based gene set testing criteria in the peer reviewed biomedical literature.</p

    Codon usage in vertebrates is associated with a low risk of acquiring nonsense mutations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Codon usage in genomes is biased towards specific subsets of codons. Codon usage bias affects translational speed and accuracy, and it is associated with the tRNA levels and the GC content of the genome. Spontaneous mutations drive genomes to a low GC content. Active cellular processes are needed to maintain a high GC content, which influences the codon usage of a species. Loss-of-function mutations, such as nonsense mutations, are the molecular basis of many recessive alleles, which can greatly affect the genome of an organism and are the cause of many genetic diseases in humans.</p> <p>Methods</p> <p>We developed an event based model to calculate the risk of acquiring nonsense mutations in coding sequences. Complete coding sequences and genomes of 40 eukaryotes were analyzed for GC and CpG content, codon usage, and the associated risk of acquiring nonsense mutations. We included one species per genus for all eukaryotes with available reference sequence.</p> <p>Results</p> <p>We discovered that the codon usage bias detected in genomes of high GC content decreases the risk of acquiring nonsense mutations (Pearson's <it>r </it>= -0.95; <it>P </it>< 0.0001). In the genomes of all examined vertebrates, including humans, this risk was lower than expected (0.93 ± 0.02; mean ± SD) and lower than the risk in genomes of non-vertebrates (1.02 ± 0.13; <it>P </it>= 0.019).</p> <p>Conclusions</p> <p>While the maintenance of a high GC content is energetically costly, it is associated with a codon usage bias harboring a low risk of acquiring nonsense mutations. The reduced exposure to this risk may contribute to the fitness of vertebrates.</p

    Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges

    Get PDF
    Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base–driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis

    Bioinformatics approaches for cross-species liver cancer analysis based on microarray gene expression profiling

    Get PDF
    BACKGROUND: The completion of the sequencing of human, mouse and rat genomes and knowledge of cross-species gene homologies enables studies of differential gene expression in animal models. These types of studies have the potential to greatly enhance our understanding of diseases such as liver cancer in humans. Genes co-expressed across multiple species are most likely to have conserved functions. We have used various bioinformatics approaches to examine microarray expression profiles from liver neoplasms that arise in albumin-SV40 transgenic rats to elucidate genes, chromosome aberrations and pathways that might be associated with human liver cancer. RESULTS: In this study, we first identified 2223 differentially expressed genes by comparing gene expression profiles for two control, two adenoma and two carcinoma samples using an F-test. These genes were subsequently mapped to the rat chromosomes using a novel visualization tool, the Chromosome Plot. Using the same plot, we further mapped the significant genes to orthologous chromosomal locations in human and mouse. Many genes expressed in rat 1q that are amplified in rat liver cancer map to the human chromosomes 10, 11 and 19 and to the mouse chromosomes 7, 17 and 19, which have been implicated in studies of human and mouse liver cancer. Using Comparative Genomics Microarray Analysis (CGMA), we identified regions of potential aberrations in human. Lastly, a pathway analysis was conducted to predict altered human pathways based on statistical analysis and extrapolation from the rat data. All of the identified pathways have been known to be important in the etiology of human liver cancer, including cell cycle control, cell growth and differentiation, apoptosis, transcriptional regulation, and protein metabolism. CONCLUSION: The study demonstrates that the hepatic gene expression profiles from the albumin-SV40 transgenic rat model revealed genes, pathways and chromosome alterations consistent with experimental and clinical research in human liver cancer. The bioinformatics tools presented in this paper are essential for cross species extrapolation and mapping of microarray data, its analysis and interpretation

    Improving gene-set enrichment analysis of RNA-Seq data with small replicates

    Get PDF
    Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterG- SEA) coded with C++ (Rcpp) is available from CRAN.open

    Differential Gene Expression in the EphA4 Knockout Spinal Cord and Analysis of the Inflammatory Response Following Spinal Cord Injury

    Get PDF
    Mice lacking the axon guidance molecule EphA4 have been shown to exhibit extensive axonal regeneration and functional recovery following spinal cord injury. To assess mechanisms by which EphA4 may modify the response to neural injury a microarray was performed on spinal cord tissue from mice with spinal cord injury and sham injured controls. RNA was purified from spinal cords of adult EphA4 knockout and wild-type mice four days following lumbar spinal cord hemisection or laminectomy only and was hybridised to Affymetrix All-Exon Array 1.0 GeneChips™. While subsequent analyses indicated that several pathways were altered in EphA4 knockout mice, of particular interest was the attenuated expression of a number of inflammatory genes, including Arginase 1, expression of which was lower in injured EphA4 knockout compared to wild-type mice. Immunohistological analyses of different cellular components of the immune response were then performed in injured EphA4 knockout and wildtype spinal cords. While numbers of infiltrating CD3+ T cells were low in the hemisection model, a robust CD11b+ macrophage/microglial response was observed post-injury. There was no difference in the overall number or spread of macrophages/activated microglia in injured EphA4 knockout compared to wild-type spinal cords at 2, 4 or 14 days post-injury, however a lower proportion of Arginase-1 immunoreactive macrophages/activated microglia was observed in EphA4 knockout spinal cords at 4 days post-injury. Subtle alterations in the neuroinflammatory response in injured EphA4 knockout spinal cords may contribute to the regeneration and recovery observed in these mice following injury

    SLEPR: A Sample-Level Enrichment-Based Pathway Ranking Method — Seeking Biological Themes through Pathway-Level Consistency

    Get PDF
    Analysis of microarray and other high throughput data often involves identification of genes consistently up or down-regulated across samples as the first step in extraction of biological meaning. This gene-level paradigm can be limited as a result of valid sample fluctuations and biological complexities. In this report, we describe a novel method, SLEPR, which eliminates this limitation by relying on pathway-level consistencies. Our method first selects the sample-level differentiated genes from each individual sample, capturing genes missed by other analysis methods, ascertains the enrichment levels of associated pathways from each of those lists, and then ranks annotated pathways based on the consistency of enrichment levels of individual samples from both sample classes. As a proof of concept, we have used this method to analyze three public microarray datasets with a direct comparison with the GSEA method, one of the most popular pathway-level analysis methods in the field. We found that our method was able to reproduce the earlier observations with significant improvements in depth of coverage for validated or expected biological themes, but also produced additional insights that make biological sense. This new method extends existing analyses approaches and facilitates integration of different types of HTP data

    Altered gene expression and DNA damage in peripheral blood cells from Friedreich's ataxia patients: Cellular model of pathology

    Get PDF
    The neurodegenerative disease Friedreich's ataxia (FRDA) is the most common autosomal-recessively inherited ataxia and is caused by a GAA triplet repeat expansion in the first intron of the frataxin gene. In this disease, transcription of frataxin, a mitochondrial protein involved in iron homeostasis, is impaired, resulting in a significant reduction in mRNA and protein levels. Global gene expression analysis was performed in peripheral blood samples from FRDA patients as compared to controls, which suggested altered expression patterns pertaining to genotoxic stress. We then confirmed the presence of genotoxic DNA damage by using a gene-specific quantitative PCR assay and discovered an increase in both mitochondrial and nuclear DNA damage in the blood of these patients (p<0.0001, respectively). Additionally, frataxin mRNA levels correlated with age of onset of disease and displayed unique sets of gene alterations involved in immune response, oxidative phosphorylation, and protein synthesis. Many of the key pathways observed by transcription profiling were downregulated, and we believe these data suggest that patients with prolonged frataxin deficiency undergo a systemic survival response to chronic genotoxic stress and consequent DNA damage detectable in blood. In conclusion, our results yield insight into the nature and progression of FRDA, as well as possible therapeutic approaches. Furthermore, the identification of potential biomarkers, including the DNA damage found in peripheral blood, may have predictive value in future clinical trials

    Eccentric Exercise Activates Novel Transcriptional Regulation of Hypertrophic Signaling Pathways Not Affected by Hormone Changes

    Get PDF
    Unaccustomed eccentric exercise damages skeletal muscle tissue, activating mechanisms of recovery and remodeling that may be influenced by the female sex hormone 17β-estradiol (E2). Using high density oligonucleotide based microarrays, we screened for differences in mRNA expression caused by E2 and eccentric exercise. After random assignment to 8 days of either placebo (CON) or E2 (EXP), eighteen men performed 150 single-leg eccentric contractions. Muscle biopsies were collected at baseline (BL), following supplementation (PS), +3 hours (3H) and +48 hours (48H) after exercise. Serum E2 concentrations increased significantly with supplementation (P<0.001) but did not affect microarray results. Exercise led to early transcriptional changes in striated muscle activator of Rho signaling (STARS), Rho family GTPase 3 (RND3), mitogen activated protein kinase (MAPK) regulation and the downstream transcription factor FOS. Targeted RT-PCR analysis identified concurrent induction of negative regulators of calcineurin signaling RCAN (P<0.001) and HMOX1 (P = 0.009). Protein contents were elevated for RND3 at 3H (P = 0.02) and FOS at 48H (P<0.05). These findings indicate that early RhoA and NFAT signaling and regulation are altered following exercise for muscle remodeling and repair, but are not affected by E2
    corecore