35 research outputs found

    Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrepresented upstream motifs

    Get PDF
    BACKGROUND: Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated gene. The computational identification of such motifs is made easier by the fact that they often appear several times in the upstream region of the regulated genes, so that the number of occurrences of relevant motifs is often significantly larger than expected by pure chance. RESULTS: To exploit this fact, we construct sets of genes characterized by the statistical overrepresentation of a certain motif in their upstream regions. Then we study the functional characterization of these sets by analyzing their annotation to Gene Ontology terms. For the sets showing a statistically significant specific functional characterization, we conjecture that the upstream motif characterizing the set is a binding site for a transcription factor involved in the regulation of the genes in the set. CONCLUSIONS: The method we propose is able to identify many known binding sites in S. cerevisiae and new candidate targets of regulation by known transcription factors. Its application to less well studied organisms is likely to be valuable in the exploration of their regulatory interaction network.Comment: 19 pages, 1 figure. Published version with several improvements. Supplementary material available from the author

    Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human

    Full text link
    In this work, we describe a computational framework for the genome-wide identification and characterization of mixed transcriptional/post-transcriptional regulatory circuits in humans. We concentrated in particular on feed-forward loops (FFL), in which a master transcription factor regulates a microRNA, and together with it, a set of joint target protein coding genes. The circuits were assembled with a two step procedure. We first constructed separately the transcriptional and post-transcriptional components of the human regulatory network by looking for conserved over-represented motifs in human and mouse promoters, and 3'-UTRs. Then, we combined the two subnetworks looking for mixed feed-forward regulatory interactions, finding a total of 638 putative (merged) FFLs. In order to investigate their biological relevance, we filtered these circuits using three selection criteria: (I) GeneOntology enrichment among the joint targets of the FFL, (II) independent computational evidence for the regulatory interactions of the FFL, extracted from external databases, and (III) relevance of the FFL in cancer. Most of the selected FFLs seem to be involved in various aspects of organism development and differentiation. We finally discuss a few of the most interesting cases in detail.Comment: 51 pages, 5 figures, 4 tables. Supporting information included. Accepted for publication in Molecular BioSystem

    Conserved transcription factor binding sites of cancer markers derived from primary lung adenocarcinoma microarrays

    Get PDF
    Gene transcription in a set of 49 human primary lung adenocarcinomas and 9 normal lung tissue samples was examined using Affymetrix GeneChip technology. A total of 3442 genes, called the set M AD, were found to be either up- or down-regulated by at least 2-fold between the two phenotypes. Genes assigned to a particular gene ontology term were found, in many cases, to be significantly unevenly distributed between the genes in and outside M AD. Terms that were overrepresented in M AD included functions directly implicated in the cancer cell metabolism. Based on their functional roles and expression profiles, genes in M AD were grouped into likely co-regulated gene sets. Highly conserved sequences in the 5 kb region upstream of the genes in these sets were identified with the motif discovery tool, MoDEL. Potential oncogenic transcription factors and their corresponding binding sites were identified in these conserved regions using the TRANSFAC 8.3 database. Several of the transcription factors identified in this study have been shown elsewhere to be involved in oncogenic processes. This study searched beyond phenotypic gene expression profiles in cancer cells, in order to identify the more important regulatory transcription factors that caused these aberrations in gene expressio

    Nearby transposable elements impact plant stress gene regulatory networks : a meta-analysis in A. thaliana and S. lycopersicum

    Get PDF
    BACKGROUND: Transposable elements (TE) make up a large portion of many plant genomes and are playing innovative roles in genome evolution. Several TEs can contribute to gene regulation by influencing expression of nearby genes as stress-responsive regulatory motifs. To delineate TE-mediated plant stress regulatory networks, we took a 2-step computational approach consisting of identifying TEs in the proximity of stress-responsive genes, followed by searching for cis-regulatory motifs in these TE sequences and linking them to known regulatory factors. Through a systematic meta-analysis of RNA-seq expression profiles and genome annotations, we investigated the relation between the presence of TE superfamilies upstream, downstream or within introns of nearby genes and the differential expression of these genes in various stress conditions in the TE-poor Arabidopsis thaliana and the TE-rich Solanum lycopersicum. RESULTS: We found that stress conditions frequently expressed genes having members of various TE superfamilies in their genomic proximity, such as SINE upon proteotoxic stress and Copia and Gypsy upon heat stress in A. thaliana, and EPRV and hAT upon infection, and Harbinger, LINE and Retrotransposon upon light stress in S. lycopersicum. These stress-specific gene-proximal TEs were mostly located within introns and more detected near upregulated than downregulated genes. Similar stress conditions were often related to the same TE superfamily. Additionally, we detected both novel and known motifs in the sequences of those TEs pointing to regulatory cooption of these TEs upon stress. Next, we constructed the regulatory network of TFs that act through binding these TEs to their target genes upon stress and discovered TE-mediated regulons targeted by TFs such as BRB/BPC, HD, HSF, GATA, NAC, DREB/CBF and MYB factors in Arabidopsis and AP2/ERF/B3, NAC, NF-Y, MYB, CXC and HD factors in tomato. CONCLUSION: Overall, we map TE-mediated plant stress regulatory networks using numerous stress expression profile studies for two contrasting plant species to study the regulatory role TEs play in the response to stress. As TE-mediated gene regulation allows plants to adapt more rapidly to new environmental conditions, this study contributes to the future development of climate-resilient plantshttp://www.biomedcentral.com/bmcgenomicsBiochemistryGeneticsMicrobiology and Plant Patholog

    Conserved transcription factor binding sites of cancer markers derived from primary lung adenocarcinoma microarrays

    Get PDF
    Gene transcription in a set of 49 human primary lung adenocarcinomas and 9 normal lung tissue samples was examined using Affymetrix GeneChip technology. A total of 3442 genes, called the set M(AD), were found to be either up- or down-regulated by at least 2-fold between the two phenotypes. Genes assigned to a particular gene ontology term were found, in many cases, to be significantly unevenly distributed between the genes in and outside M(AD). Terms that were overrepresented in M(AD) included functions directly implicated in the cancer cell metabolism. Based on their functional roles and expression profiles, genes in M(AD) were grouped into likely co-regulated gene sets. Highly conserved sequences in the 5 kb region upstream of the genes in these sets were identified with the motif discovery tool, MoDEL. Potential oncogenic transcription factors and their corresponding binding sites were identified in these conserved regions using the TRANSFAC 8.3 database. Several of the transcription factors identified in this study have been shown elsewhere to be involved in oncogenic processes. This study searched beyond phenotypic gene expression profiles in cancer cells, in order to identify the more important regulatory transcription factors that caused these aberrations in gene expression

    Complex organizational structure of the genome revealed by genome-wide analysis of single and alternative promoters in Drosophila melanogaster

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The promoter is a critical necessary transcriptional <it>cis</it>-regulatory element. In addition to its role as an assembly site for the basal transcriptional apparatus, the promoter plays a key part in mediating temporal and spatial aspects of gene expression through differential binding of transcription factors and selective interaction with distal enhancers. Although many genes have multiple promoters, little attention has been focused on how these relate to one another; nor has much study been directed at relationships between promoters of adjacent genes.</p> <p>Results</p> <p>We have undertaken a systematic investigation of <it>Drosophila </it>promoters. We divided promoters into three groups: unique promoters, first alternative promoters (the most 5' of a gene's multiple promoters), and downstream alternative promoters (the remaining alternative promoters 3' to the first). We observed distinct nucleotide distribution and sequence motif preferences among these three classes. We also investigated the promoters of neighboring genes and found that a greater than expected number of adjacent genes have similar sequence motif profiles, which may allow the genes to be regulated in a coordinated fashion. Consistent with this, there is a positive correlation between similar promoter motifs and related gene expression profiles for these genes.</p> <p>Conclusions</p> <p>Our results suggest that different regulatory mechanisms may apply to each of the three promoter classes, and provide a mechanism for "gene expression neighborhoods," local clusters of co-expressed genes. As a whole, our data reveal an unexpected complexity of genomic organization at the promoter level with respect to both alternative and neighboring promoters.</p

    Molecular determinants of caste differentiation in the highly eusocial honeybee Apis mellifera

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In honeybees, differential feeding of female larvae promotes the occurrence of two different phenotypes, a queen and a worker, from identical genotypes, through incremental alterations, which affect general growth, and character state alterations that result in the presence or absence of specific structures. Although previous studies revealed a link between incremental alterations and differential expression of physiometabolic genes, the molecular changes accompanying character state alterations remain unknown.</p> <p>Results</p> <p>By using cDNA microarray analyses of >6,000 <it>Apis mellifera </it>ESTs, we found 240 differentially expressed genes (DEGs) between developing queens and workers. Many genes recorded as up-regulated in prospective workers appear to be unique to <it>A. mellifera</it>, suggesting that the workers' developmental pathway involves the participation of novel genes. Workers up-regulate more developmental genes than queens, whereas queens up-regulate a greater proportion of physiometabolic genes, including genes coding for metabolic enzymes and genes whose products are known to regulate the rate of mass-transforming processes and the general growth of the organism (e.g., <it>tor</it>). Many DEGs are likely to be involved in processes favoring the development of caste-biased structures, like brain, legs and ovaries, as well as genes that code for cytoskeleton constituents. Treatment of developing worker larvae with juvenile hormone (JH) revealed 52 JH responsive genes, specifically during the critical period of caste development. Using Gibbs sampling and Expectation Maximization algorithms, we discovered eight overrepresented <it>cis</it>-elements from four gene groups. Graph theory and complex networks concepts were adopted to attain powerful graphical representations of the interrelation between <it>cis</it>-elements and genes and objectively quantify the degree of relationship between these entities.</p> <p>Conclusion</p> <p>We suggest that clusters of functionally related DEGs are co-regulated during caste development in honeybees. This network of interactions is activated by nutrition-driven stimuli in early larval stages. Our data are consistent with the hypothesis that JH is a key component of the developmental determination of queen-like characters. Finally, we propose a conceptual model of caste differentiation in <it>A. mellifera </it>based on gene-regulatory networks.</p

    Psoriasis drug development and GWAS interpretation through in silico analysis of transcription factor binding sites

    Full text link
    BackgroundPsoriasis is a cytokine‐mediated skin disease that can be treated effectively with immunosuppressive biologic agents. These medications, however, are not equally effective in all patients and are poorly suited for treating mild psoriasis. To develop more targeted therapies, interfering with transcription factor (TF) activity is a promising strategy.MethodsMeta‐analysis was used to identify differentially expressed genes (DEGs) in the lesional skin from psoriasis patients (n = 237). We compiled a dictionary of 2935 binding sites representing empirically‐determined binding affinities of TFs and unconventional DNA‐binding proteins (uDBPs). This dictionary was screened to identify “psoriasis response elements” (PREs) overrepresented in sequences upstream of psoriasis DEGs.ResultsPREs are recognized by IRF1, ISGF3, NF‐kappaB and multiple TFs with helix‐turn‐helix (homeo) or other all‐alpha‐helical (high‐mobility group) DNA‐binding domains. We identified a limited set of DEGs that encode proteins interacting with PRE motifs, including TFs (GATA3, EHF, FOXM1, SOX5) and uDBPs (AVEN, RBM8A, GPAM, WISP2). PREs were prominent within enhancer regions near cytokine‐encoding DEGs (IL17A, IL19 and IL1B), suggesting that PREs might be incorporated into complex decoy oligonucleotides (cdODNs). To illustrate this idea, we designed a cdODN to concomitantly target psoriasis‐activated TFs (i.e., FOXM1, ISGF3, IRF1 and NF‐kappaB). Finally, we screened psoriasis‐associated SNPs to identify risk alleles that disrupt or engender PRE motifs. This identified possible sites of allele‐specific TF/uDBP binding and showed that PREs are disproportionately disrupted by psoriasis risk alleles.ConclusionsWe identified new TF/uDBP candidates and developed an approach that (i) connects transcriptome informatics to cdODN drug development and (ii) enhances our ability to interpret GWAS findings. Disruption of PRE motifs by psoriasis risk alleles may contribute to disease susceptibility.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155494/1/ctm2s4016901500545-sup-0001.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155494/2/ctm2s4016901500545-sup-0018.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155494/3/ctm2s4016901500545-sup-0002.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155494/4/ctm2s4016901500545.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155494/5/ctm2s4016901500545-sup-0009.pd

    Computational and transcriptional evidence for microRNAs in the honey bee genome

    Get PDF
    A total of 68 non-redundant candidate honey bee miRNAs were identified computationally; several of them appear to have previously unrecognized orthologs in the Drosophila genome. Several miRNAs showed caste- or age-related differences in transcript abundance and are likely to be involved in regulating honey bee development

    Suuremahuliste andmete kasutamine geenidevaheliste seoste leidmiseks

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.Geenid mÀÀravad Ă€ra, millistest RNA ja valgu molekulidest elusorganism koosneb. Ainult geenide tuvastamisest ei piisa, et aru saada kuidas organism toimib, millal ja kuidas erinevad geenide produktid avalduvad ja mida need teevad. Elusorganismi olemuse mĂ”istmiseks ja bioloogiliste protsesside mĂ”jutamiseks on vajalik aru saada geenide ja valkude omavahelistest seostest. Suure lĂ€bilaskevĂ”imega tehnoloogiad vĂ”imaldavad hĂ”lpsasti mÔÔta bioloogiliste protsesside erinevaid tahke. See omakorda on toonud kaasa andmemahtude ĂŒha kiireneva kasvutrendi ning vajaduse uute meetodite jĂ€rele, mis aitaks toorandmeid analĂŒĂŒsida, andmeid omavahel kombineerida ning tulemusi visualiseerida. Samuti on kasvanud vajadus arvutuslike meetoditega katsetada, kas olemasolevad andmemudelid kirjeldavad bioloogilist uurimisobjekti piisavalt tĂ€pselt. KĂ€esolevas uurimistöös on nĂ€idatud erinevaid bioinformaatilisi meetodeid, kuidas suuremahuliste ning eritĂŒĂŒbiliste eksperimentaalsete andmete kombineerimist saab rakendada geenidevaheliste seoste leidmiseks. Suuremahulistele andmetele on integreerimise ja omavahel vĂ”rreldavaks tegemisega vĂ”imalik anda lisavÀÀrtust. Töö kĂ€igus koondati kokku ja tehti avalikkusele ligipÀÀsetavaks embrĂŒonaalsete tĂŒvirakkude regulatsiooni kĂ€sitlevate publikatsioonide lisafailides avaldatud info ESCDb andmebaasi nĂ€ol. Neid andmeid kasutades on teadlaskonnal vĂ”imalik leida geenide vahelisi seoseid, mida eraldiseisvaid andmeid analĂŒĂŒsides ei ole vĂ”imalik vĂ€lja selgitada. Andmebaasi kogutud info kombineerimisel arvutusliku mudeldamisega Ă”nnestus leida kĂ€esoleva töö raames uus regulaator embrĂŒonaalsetes tĂŒvirakkudes — IL11. Lisaks vĂ”imaldas erinevate andmetĂŒĂŒpide kombineerimine leida embrĂŒonaalsete tĂŒvirakkude keskse regulaatori — OCT4 geeni alternatiivsed mĂ€rklaudgeenide moodulid. Kasutades DNA konserveerumisinfot koos regulatoorsete motiivide analĂŒĂŒsiga leiti kolm uut rasvatĂŒvirakkude diferentseerumise regulaatorvalku. Samuti kĂ€sitletakse töös automaatset grupeerimis- ja visualiseerimismetoodikat VisHiC, mis aitab esile tĂ”sta huvitavaid geenigruppe, mida teiste meetoditega edasi uurida. Töös on nĂ€idatud erinevaid suuremahuliste andmestike integreerimise viise, mis vĂ”imaldavad leida selliseid geenidevahelisi seoseid, mida ei oleks vĂ”imalik leida kui analĂŒĂŒsiksime ĂŒht andmestikku korraga.In order to understand the basic principles of how organisms function, and to be able to affect the biological processes, we need to understand relationships between genes and proteins. Modern high-throughput technology enables to study different sides of biological processes in a rapid manner. This, however, has led to a steady growth of amount of data available. The need for more sophisticated methods for analysing raw data, for combining different data sources, and to visualise the results, has emerged. Additionally, computational modeling is required to test if our understanding of biological processes is supported by the available data. A variety of bioinformatics methods are used to demonstrate how to combine different type of high-throughput data for identifying relationships between genes. Furthermore, it was shown that through combining various data types from different sources adds value to already published data. In the thesis, data from publications about embryonic stem cell regulation were collected together and made available through Embryonic Stem Cell Database (ESCDb). Complementary data in the database allows researchers to find relationships between genes that would not be possible when analysing only one dataset at a time. One of the main findings of this study illustrates how using computational modelling on data from the ESCDb allowed to find a novel pluripotency regulator — IL11. Additionally, integration of different data types led to identification of alternative gene regulatory modules of core pluripotency regulator OCT4. Similarly, combination of conservation data and regulatory motif analysis led to identification of three new regulators of adipocyte differentiation. This thesis also covers innovative methodology, VisHiC, for automatic identification and visualisation of functionally related gene sets. This methodology allows to find relevant gene sets for further characterisation from large high-throughput datasets. This doctoral thesis demonstrates that integration of different high-throughput datasets enables establishing gene-gene relationships that would not be possible when looking at a single data type in isolation
    corecore