1,475 research outputs found

    Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data

    Get PDF
    BACKGROUND: Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. RESULTS: Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE), this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model) performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re-sampling techniques were studied. A random over-sampling method supported the implementation of the most powerful prediction models. The KStar model was also able to achieve higher predictive sensitivities and specificities using random over-sampling techniques. CONCLUSION: The approaches assessed in this paper represent an efficient and relatively inexpensive in silico methodology for supporting large-scale analysis of photoreceptor gene expression by SAGE. They may be applied as complementary methodologies to support functional predictions before implementing more comprehensive, experimental prediction and validation methods. They may also be combined with other large-scale, data-driven methods to facilitate the inference of transcriptional regulatory networks in the developing retina. Furthermore, the methodology assessed may be applied to other data domains

    Regulation of transcription by the Arabidopsis UVR8 photoreceptor involves a specific histone modification

    Get PDF
    The photoreceptor UV RESISTANCE LOCUS 8 (UVR8) specifically mediates photomorphogenic responses to UV-B wavelengths. UVR8 acts by regulating transcription of a set of genes, but the underlying mechanisms are unknown. Previous research indicated that UVR8 can associate with chromatin, but the specificity and functional significance of this interaction are not clear. Here we show, by chromatin immunoprecipitation, that UV-B exposure of Arabidopsis increases acetylation of lysines K9 and/or K14 of histone H3 at UVR8-regulated gene loci in a UVR8-dependent manner. The transcription factors HY5 and/or HYH, which mediate UVR8-regulated transcription, are also required for this chromatin modification, at least for the ELIP1 gene. Furthermore, sequencing of the immunoprecipitated DNA revealed that all UV-B-induced enrichments in H3K9,14diacetylation across the genome are UVR8-dependent, and approximately 40 % of the enriched loci contain known UVR8-regulated genes. In addition, inhibition of histone acetylation by anacardic acid reduces the UV-B induced, UVR8 mediated expression of ELIP1 and CHS. No evidence was obtained in yeast 2-hybrid assays for a direct interaction between either UVR8 or HY5 and several proteins involved in light-regulated histone modification, nor for the involvement of these proteins in UVR8-mediated responses in plants, although functional redundancy between proteins could influence the results. In summary, this study shows that UVR8 regulates a specific chromatin modification associated with transcriptional regulation of a set of UVR8-target genes

    Identification of regulatory targets of tissue-specific transcription factors: application to retina-specific gene regulation

    Get PDF
    Identification of tissue-specific gene regulatory networks can yield insights into the molecular basis of a tissue's development, function and pathology. Here, we present a computational approach designed to identify potential regulatory target genes of photoreceptor cell-specific transcription factors (TFs). The approach is based on the hypothesis that genes related to the retina in terms of expression, disease and/or function are more likely to be the targets of retina-specific TFs than other genes. A list of genes that are preferentially expressed in retina was obtained by integrating expressed sequence tag, SAGE and microarray datasets. The regulatory targets of retina-specific TFs are enriched in this set of retina-related genes. A Bayesian approach was employed to integrate information about binding site location relative to a gene's transcription start site. Our method was applied to three retina-specific TFs, CRX, NRL and NR2E3, and a number of potential targets were predicted. To experimentally assess the validity of the bioinformatic predictions, mobility shift, transient transfection and chromatin immunoprecipitation assays were performed with five predicted CRX targets, and the results were suggestive of CRX regulation in 5/5, 3/5 and 4/5 cases, respectively. Together, these experiments strongly suggest that RP1, GUCY2D, ABCA4 are novel targets of CRX

    Bioinformatic identification of novel putative photoreceptor specific cis-elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cell specific gene expression is largely regulated by different combinations of transcription factors that bind <it>cis</it>-elements in the upstream promoter sequence. However, experimental detection of <it>cis</it>-elements is difficult, expensive, and time-consuming. This provides a motivation for developing bioinformatic methods to identify <it>cis</it>-elements that could prioritize future experimental studies. Here, we use motif discovery algorithms to predict transcription factor binding sites involved in regulating the differences between murine rod and cone photoreceptor populations.</p> <p>Results</p> <p>To identify highly conserved motifs enriched in promoters that drive expression in either rod or cone photoreceptors, we assembled a set of murine rod-specific, cone-specific, and non-photoreceptor background promoter sequences. These sets were used as input to a newly devised motif discovery algorithm called Iterative Alignment/Modular Motif Selection (IAMMS). Using IAMMS, we predicted 34 motifs that may contribute to rod-specific (19 motifs) or cone-specific (15 motifs) expression patterns. Of these, 16 rod- and 12 cone-specific motifs were found in clusters near the transcription start site. New findings include the observation that cone promoters tend to contain TATA boxes, while rod promoters tend to be TATA-less (exempting <it>Rho </it>and <it>Cnga1</it>). Additionally, we identify putative sites for IL-6 effectors (in rods) and RXR family members (in cones) that can explain experimental data showing changes to cell-fate by activating these signaling pathways during rod/cone development. Two of the predicted motifs (NRE and ROP2) have been confirmed experimentally to be involved in cell-specific expression patterns. We provide a full database of predictions as additional data that may contain further valuable information. IAMMS predictions are compared with existing motif discovery algorithms, DME and BioProspector. We find that over 60% of IAMMS predictions are confirmed by at least one other motif discovery algorithm.</p> <p>Conclusion</p> <p>We predict novel, putative <it>cis-</it>elements enriched in the promoter of rod-specific or cone-specific genes. These are candidate binding sites for transcription factors involved in maintaining functional differences between rod and cone photoreceptor populations.</p

    Clustering-based approaches to SAGE data mining

    Get PDF
    Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation

    Mapping and Functional Analysis of cis-Regulatory Elements in Mouse Photoreceptors

    Get PDF
    Photoreceptors are light-sensitive neurons that mediate vision, and they are the most commonly affected cell type in genetic forms of blindness. In mice, there are two basic types of photoreceptors, rods and cones, which mediate vision in dim and bright environments, respectively. The transcription factors (TFs) that control rod and cone development have been studied in detail, but the cis-regulatory elements (CREs) through which these TFs act are less well understood. To comprehensively identify photoreceptor CREs in mice and to understand their relationship with gene expression, we performed open chromatin (ATAC-seq) and transcriptome (RNA-seq) profiling of FACS-purified rods and cones. We find that rods have significantly fewer regions of open chromatin than cones (as well as \u3e60 additional cell types and tissues), and we demonstrate that this uniquely closed chromatin architecture depends on the rod master regulator Nrl. Finally, we find that regions of rod- and cone-specific open chromatin are enriched for distinct sets of TF binding sites, providing insight into the cis-regulatory grammar of these cell types. We also sought to understand how the regulatory activity of rod and cone open chromatin regions is encoded in DNA sequence. Cone-rod homeobox (CRX) is a paired-like homeodomain TF and master regulator of both rod and cone development, and CRX binding sites are by far the most enriched TF binding sites in photoreceptor CREs. The in vitro DNA binding preferences of CRX have been extensively characterized, but how well in vitro models of TF binding site affinity predict in vivo regulatory activity is not known. In addition, paired-class homeodomain TFs bind DNA as both monomers and dimers, but whether monomeric and dimeric CRX binding sites have distinct regulatory activities is not known. To address these questions, we used a massively parallel reporter assay to quantify the activity of thousands native and mutant CRX binding sites in explanted mouse retinas. These data reveal that dimeric CRX binding sites encode stronger enhancers than monomeric CRX binding sites. Moreover, the activity of half-sites within dimeric CRX binding sites is cooperative and spacing-dependent. In addition, saturating mutagenesis of 195 CRX binding sites reveals that, while TF binding site affinity and activity are moderately correlated across mutations within individual CREs, they are poorly correlated across mutations from distinct CREs. Accordingly, we show that accounting for baseline CRE activity improves the prediction of the effects of mutations in regulatory DNA from sequence-based models. Taken together, these data demonstrate that the activity of CRX binding sites depends on multiple layers of sequence context, providing insight into photoreceptor gene regulation and illustrating functional principles of homeodomain TF binding sites

    Deciphering mechanisms governing the development of the rod epigenome

    Get PDF
    Precisely coordinated expression of distinct sets of genes is essential for cellular development and function, especially in complex multicellular organisms. This regulation is achieved by the action of transcription factors (TF), proteins that bind specific genomic locations and alter the activity state and packaging of the DNA to promote or repress gene expression. However, while tremendous effort has defined networks of transcription factors that work together to drive specific phenotypes, little is known about their differential activity at the hundreds or thousands of sites where they bind. There are also many questions regarding the basic principles of the packaging of DNA within the nucleus, its influence on gene expression, and how transcription factors regulate this process. To address these questions, I have investigated gene regulatory mechanisms that control the development of rod and cone photoreceptors. Photoreceptors are the neurons of the retina responsible for the initial conversion of a visual stimulus into an electrical signal. Photoreceptors are a complex, accessible, and highly disease relevant neuronal cell population, the lessons from which are relevant for many other cell types across the body. A primary TF in photoreceptor development is CRX. This TF when mutated in humans can cause severe vision loss, and its deletion in mice leads to a severe condition with no functional photoreceptors. First, I determined the functional consequences of human disease-causing variants when modelled in mice. I functionally classified a new model as causing a previously un-determined dominant disease, and discovered that all CRX diseases are the result of graded differences in gene expression of the same core set of genes. Second, I examined the dependency of all genomic binding sites on CRX activity and identified a core set of regulatory elements that require CRX for activation of the target genes. Third, I interrogated the broader organization of the rod photoreceptor epigenome and demonstrated that the rod packages its DNA according to epigenomic activity state. Perturbations to this state can override this organization, resulting in functional consequences on gene expression beyond the local sequence. In summary, my in-depth investigation has uncovered new insights into the molecular mechanisms controlling development and maintenance of the rod epigenome and its organization in the nucleus. This new knowledge will provide 1) a new understanding of where and how mutations in a key photoreceptor TF cause gene mis-regulation, and 2) guidance for future human genetic studies to identify new disease-causing mutations affecting photoreceptor integrity, not only in the protein coding sequences but also in specific non-coding regulatory regions

    Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge

    Get PDF
    Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys(2)His(2) Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys(2)His(2) transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins

    Novel Approaches to Studying the Effects of Cis-Regulatory Variants in the Central Nervous System

    Get PDF
    For decades, studies of the genetic basis of disease have focused on rare coding mutations that disrupt protein function, leading to the identification of hundreds of genes underlying Mendelian diseases. However, many complex diseases are non-Mendelian, and less than 2% of the genome is coding. It is now clear that non-coding variants contribute to disease susceptibility, but the precise underlying mechanisms are generally unknown. Cis-regulatory elements (CREs) are transcription factor (TF)-bound genomic regions that regulate gene expression, and variants within CREs can therefore modify gene expression. The putative locations of CREs in a variety of cell types have been identified through genome-wide assays of TF binding and epigenomic signatures, providing a starting point for probing the effects of cis-regulatory variants. Unlike coding mutations, which can be interpreted based on the genetic code, the functional consequence of any given cis-regulatory variant is difficult to predict even at the molecular level. Therefore, a major bottleneck lies in interpreting the functional significance of these variants. In the present work, I study the effects of cis-regulatory variants in the central nervous system (CNS), specifically in retina and brain. The retina is composed of well-characterized neuronal cell types and an extensively studied transcriptional network, while the brain is the center of human cognition and a target of devastating neuropsychiatric diseases. First, I take advantage of the genetic diversity between two distantly related mouse strains to describe the relationship between cis-regulatory variants and differences in retinal gene expression. I identify cis- and trans-regulatory effects, as well as parent-of-origin effects. Second, I develop a new technology based on an existing massively parallel reporter assay, CRE-seq, to enable the functional study of long CREs in the CNS in vivo for the first time. I demonstrate the ability of this approach to measure tissue-specific cis-regulatory activity in the brain and to pinpoint DNA bases critical for activity. Finally, I conduct a detailed mechanistic study of a non-coding region containing variants associated with both human cognitive performance and bipolar disorder. This last study illustrates the complexities and challenges of establishing the causal role of non-coding variants in disease
    • …
    corecore