39 research outputs found
Epigenome-450K-wide methylation signatures of active cigarette smoking: The Young Finns Study
Smoking as a major risk factor for morbidity affects numerous regulatory systems of the human body including DNA methylation. Most of the previous studies with genome-wide methylation data are based on conventional association analysis and earliest threshold-based gene set analysis that lacks sensitivity to be able to reveal all the relevant effects of smoking. The aim of the present study was to investigate the impact of active smoking on DNA methylation at three biological levels: 5'-C-phosphate-G-3' (CpG) sites, genes and functionally related genes (gene sets). Gene set analysis was done with mGSZ, a modern threshold-free method previously developed by us that utilizes all the genes in the experiment and their differential methylation scores. Application of such method in DNA methylation study is novel. Epigenome-wide methylation levels were profiled from Young Finns Study (YFS) participants' whole blood from 2011 follow-up using Illumina Infinium HumanMethylation450 BeadChips. We identified three novel smoking related CpG sites and replicated 57 of the previously identified ones. We found that smoking is associated with hypomethylation in shore (genomic regions 0-2 kilobases from CpG island). We identified smoking related methylation changes in 13 gene sets with false discovery rate (FDR) <= 0.05, among which is olfactory receptor activity, the flagship novel finding of the present study. Overall, we extended the current knowledge by identifying: (i) three novel smoking related CpG sites, (ii) similar effects as aging on average methylation in shore, and (iii) a novel finding that olfactory receptor activity pathway responds to tobacco smoke and toxin exposure through epigenetic mechanisms
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes
BACKGROUND: A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species. RESULTS: In this paper, we introduce two performance measures for evaluating the results of a clustering algorithm in its ability to produce biologically meaningful clusters. The first measure is a biological homogeneity index (BHI). As the name suggests, it is a measure of how biologically homogeneous the clusters are. This can be used to quantify the performance of a given clustering algorithm such as UPGMA in grouping genes for a particular data set and also for comparing the performance of a number of competing clustering algorithms applied to the same data set. The second performance measure is called a biological stability index (BSI). For a given clustering algorithm and an expression data set, it measures the consistency of the clustering algorithm's ability to produce biologically meaningful clusters when applied repeatedly to similar data sets. A good clustering algorithm should have high BHI and moderate to high BSI. We evaluated the performance of ten well known clustering algorithms on two gene expression data sets and identified the optimal algorithm in each case. The first data set deals with SAGE profiles of differentially expressed tags between normal and ductal carcinoma in situ samples of breast cancer patients. The second data set contains the expression profiles over time of positively expressed genes (ORF's) during sporulation of budding yeast. Two separate choices of the functional classes were used for this data set and the results were compared for consistency. CONCLUSION: Functional information of annotated genes available from various GO databases mined using ontology tools can be used to systematically judge the results of an unsupervised clustering algorithm as applied to a gene expression data set in clustering genes. This information could be used to select the right algorithm from a class of clustering algorithms for the given data set
PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data
<p>Abstract</p> <p>Background</p> <p>Of the 5 484 predicted proteins of <it>Plasmodium falciparum</it>, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes.</p> <p>Results</p> <p>We present PlasmoDraft <url>http://atgc.lirmm.fr/PlasmoDraft/</url>, a database of Gene Ontology (GO) annotation predictions for <it>P. falciparum </it>genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a <it>Guilt By Association </it>method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all <it>P. falciparum </it>genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2 434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (<it>e.g</it>. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1 905 and 1 540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%).</p> <p>Conclusion</p> <p>All predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.</p
Digital Gene Expression Profiling by 5′-End Sequencing of cDNAs during Reprogramming in the Moss Physcomitrella patens
Stem cells self-renew and repeatedly produce differentiated cells during development and growth. The differentiated cells can be converted into stem cells in some metazoans and land plants with appropriate treatments. After leaves of the moss Physcomitrella patens are excised, leaf cells reenter the cell cycle and commence tip growth, which is characteristic of stem cells called chloronema apical cells. To understand the underlying molecular mechanisms, a digital gene expression profiling method using mRNA 5′-end tags (5′-DGE) was established. The 5′-DGE method produced reproducible data with a dynamic range of four orders that correlated well with qRT-PCR measurements. After the excision of leaves, the expression levels of 11% of the transcripts changed significantly within 6 h. Genes involved in stress responses and proteolysis were induced and those involved in metabolism, including photosynthesis, were reduced. The later processes of reprogramming involved photosynthesis recovery and higher macromolecule biosynthesis, including of RNA and proteins. Auxin and cytokinin signaling pathways, which are activated during stem cell formation via callus in flowering plants, are also activated during reprogramming in P. patens, although no exogenous phytohormone is applied in the moss system, suggesting that an intrinsic phytohormone regulatory system may be used in the moss
Molecular classification of selective oestrogen receptor modulators on the basis of gene expression profiles of breast cancer cells expressing oestrogen receptor α
The purpose of this study was to classify selective oestrogen receptor modulators based on gene expression profiles produced in breast cancer cells expressing either wtERα or mutant351ERα. In total, 54 microarray experiments were carried out by using a commercially available Atlas cDNA Expression Arrays (Clontech), containing 588 cancer-related genes. Nine sets of data were generated for each cell line following 24 h of treatment: expression data were obtained for cells treated with vehicle EtOH (Control); with 10−9 or 10−8 M oestradiol; with 10−6 M 4-hydroxytamoxifen; with 10−6 M raloxifene; with 10−6 M idoxifene, with 10−6 M EM 652, with 10−6 M GW 7604; with 5×10−5 M resveratrol and with 10−6 M ICI 182,780. We developed a new algorithm ‘Expression Signatures’ to classify compounds on the basis of differential gene expression profiles. We created dendrograms for each cell line, in which branches represent relationships between compounds. Additionally, clustering analysis was performed using different subsets of genes to assess the robustness of the analysis. In general, only small differences between gene expression profiles treated with compounds were observed with correlation coefficients ranged from 0.83 to 0.98. This observation may be explained by the use of the same cell context for treatments with compounds that essentially belong to the same class of drugs with oestrogen receptors related mechanisms. The most surprising observation was that ICI 182,780 clustered together with oestrodiol and raloxifene for cells expressing wtERα and clustered together with EM 652 for cells expressing mutant351ERα. These data provide a rationale for a more precise and elaborate study in which custom made oligonucleotide arrays can be used with comprehensive sets of genes known to have consensus and putative oestrogen response elements in their promoter regions
Expression profiles of switch-like genes accurately classify tissue and infectious disease phenotypes in model-based classification
<p>Abstract</p> <p>Background</p> <p>Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets.</p> <p>Results</p> <p>Use of a model-based clustering algorithm accurately classified more than 400 microarray samples into 19 different tissue types on the basis of bimodal gene expression. Bimodal expression patterns were also highly effective in differentiating between infectious diseases in model-based clustering of microarray data. Supervised classification with feature selection restricted to switch-like genes also recognized tissue specific and infectious disease specific signatures in independent test datasets reserved for validation. Determination of "on" and "off" states of switch-like genes in various tissues and diseases allowed for the identification of activated/deactivated pathways. Activated switch-like genes in neural, skeletal muscle and cardiac muscle tissue tend to have tissue-specific roles. A majority of activated genes in infectious disease are involved in processes related to the immune response.</p> <p>Conclusion</p> <p>Switch-like bimodal gene sets capture genome-wide signatures from microarray data in health and infectious disease. A subset of bimodal genes coding for extracellular and membrane proteins are associated with tissue specificity, indicating a potential role for them as biomarkers provided that expression is altered in the onset of disease. Furthermore, we provide evidence that bimodal genes are involved in temporally and spatially active mechanisms including tissue-specific functions and response of the immune system to invading pathogens.</p
Inducible cAMP Early Repressor (ICER) and Brain Functions
The inducible cAMP early repressor (ICER) is an endogenous repressor of cAMP-responsive element (CRE)-mediated gene transcription and belongs to the CRE-binding protein (CREB)/CRE modulator (CREM)/activating transcription factor 1 (ATF-1) gene family. ICER plays an important role in regulating the neuroendocrine system and the circadian rhythm. Other aspects of ICER function have recently attracted heightened attention. Being a natural inducible CREB antagonist, and more broadly, an inducible repressor of CRE-mediated gene transcription, ICER regulates long-lasting plastic changes that occur in the brain in response to incoming stimulation. This review will bring together data on ICER and its functions in the brain, with a special emphasis on recent findings highlighting the involvement of ICER in the regulation of long-term plasticity underlying learning and memory
Berry Flesh and Skin Ripening Features in Vitis vinifera as Assessed by Transcriptional Profiling
Background
Ripening of fleshy fruit is a complex developmental process involving the differentiation of tissues with separate functions. During grapevine berry ripening important processes contributing to table and wine grape quality take place, some of them flesh- or skin-specific. In this study, transcriptional profiles throughout flesh and skin ripening were followed during two different seasons in a table grape cultivar ‘Muscat Hamburg’ to determine tissue-specific as well as common developmental programs.
Methodology/Principal Findings
Using an updated GrapeGen Affymetrix GeneChip® annotation based on grapevine 12×v1 gene predictions, 2188 differentially accumulated transcripts between flesh and skin and 2839 transcripts differentially accumulated throughout ripening in the same manner in both tissues were identified. Transcriptional profiles were dominated by changes at the beginning of veraison which affect both pericarp tissues, although frequently delayed or with lower intensity in the skin than in the flesh. Functional enrichment analysis identified the decay on biosynthetic processes, photosynthesis and transport as a major part of the program delayed in the skin. In addition, a higher number of functional categories, including several related to macromolecule transport and phenylpropanoid and lipid biosynthesis, were over-represented in transcripts accumulated to higher levels in the skin. Functional enrichment also indicated auxin, gibberellins and bHLH transcription factors to take part in the regulation of pre-veraison processes in the pericarp, whereas WRKY and C2H2 family transcription factors seems to more specifically participate in the regulation of skin and flesh ripening, respectively.
Conclusions/Significance
A transcriptomic analysis indicates that a large part of the ripening program is shared by both pericarp tissues despite some components are delayed in the skin. In addition, important tissue differences are present from early stages prior to the ripening onset including tissue-specific regulators. Altogether, these findings provide key elements to understand berry ripening and its differential regulation in flesh and skin.This study was financially supported by GrapeGen Project funded by Genoma España within a collaborative agreement with Genome Canada. The authors also thank The Ministerio de Ciencia e Innovacion for project BIO2008-03892 and a bilateral collaborative grant with Argentina (AR2009-0021). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer reviewe
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.
Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole genome mutation screening in Candida albicans and aeruginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.
Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent