326 research outputs found

    MEME-ChIP: motif analysis of large DNA datasets

    Get PDF
    Motivation: Advances in high-throughput sequencing have resulted in rapid growth in large, high-quality datasets including those arising from transcription factor (TF) ChIP-seq experiments. While there are many existing tools for discovering TF binding site motifs in such datasets, most web-based tools cannot directly process such large datasets

    JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles

    Get PDF
    JASPAR (http://jaspar.genereg.net) is the leading open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. Its fourth major release is the largest expansion of the core database to date: the database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. The introduction of a yeast division brings the convenience of JASPAR to an active research community. As binding models are refined by newer data, the JASPAR database now uses versioning of matrices: in this release, 12% of the older models were updated to improved versions. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family. The changes in the database set the system ready for more rapid acquisition of new high-throughput data sources. Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches

    The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences

    Get PDF
    The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein–DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data ‘boutiques’ within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk

    The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources

    Get PDF
    BTO, the BRENDA Tissue Ontology (http://www.BTO.brenda-enzymes.org) represents a comprehensive structured encyclopedia of tissue terms. The project started in 2003 to create a connection between the enzyme data collection of the BRENDA enzyme database and a structured network of source tissues and cell types. Currently, BTO contains more than 4600 different anatomical structures, tissues, cell types and cell lines, classified under generic categories corresponding to the rules and formats of the Gene Ontology Consortium and organized as a directed acyclic graph (DAG). Most of the terms are endowed with comments on their derivation or definitions. The content of the ontology is constantly curated with ∼1000 new terms each year. Four different types of relationships between the terms are implemented. A versatile web interface with several search and navigation functionalities allows convenient online access to the BTO and to the enzymes isolated from the tissues. Important areas of applications of the BTO terms are the detection of enzymes in tissues and the provision of a solid basis for text-mining approaches in this field. It is widely used by lab scientists, curators of genomic and biochemical databases and bioinformaticians. The BTO is freely available at http://www.obofoundry.org

    UniPROBE: an online database of protein binding microarray data on protein–DNA interactions

    Get PDF
    The UniPROBE (Universal PBM Resource for Oligonucleotide Binding Evaluation) database hosts data generated by universal protein binding microarray (PBM) technology on the in vitro DNA-binding specificities of proteins. This initial release of the UniPROBE database provides a centralized resource for accessing comprehensive PBM data on the preferences of proteins for all possible sequence variants (‘words’) of length k (‘k-mers’), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In total, the database hosts DNA-binding data for over 175 nonredundant proteins from a diverse collection of organisms, including the prokaryote Vibrio harveyi, the eukaryotic malarial parasite Plasmodium falciparum, the parasitic Apicomplexan Cryptosporidium parvum, the yeast Saccharomyces cerevisiae, the worm Caenorhabditis elegans, mouse and human. Current web tools include a text-based search, a function for assessing motif similarity between user-entered data and database PWMs, and a function for locating putative binding sites along user-entered nucleotide sequences. The UniPROBE database is available at http://thebrain.bwh.harvard.edu/uniprobe/

    ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species

    Get PDF
    Transcription factors (TFs) are key components in signaling pathways, and the presence of their binding sites in the promoter regions of DNA is essential for their regulation of the expression of the corresponding genes. Orthologous promoter sequences are commonly used to increase the specificity with which potentially functional transcription factor binding sites (TFBSs) are recognized and to detect possibly important similarities or differences between the different species. The ConTra (conserved TFBSs) web server provides the biologist at the bench with a user-friendly tool to interactively visualize TFBSs predicted using either TransFac (1) or JASPAR (2) position weight matrix libraries, on a promoter alignment of choice. The visualization can be preceded by a simple scoring analysis to explore which TFs are the most likely to bind to the promoter of interest. The ConTra web server is available at http://bioit.dmbr.ugent.be/ConTra/index.php

    Non-Coding-Regulatory Regions Of Human Brain Genes Delineated By Bacterial Artificial Chromosome Knock-In Mice

    Get PDF
    Background The next big challenge in human genetics is understanding the 98% of the genome that comprises non-coding DNA. Hidden in this DNA are sequences critical for gene regulation, and new experimental strategies are needed to understand the functional role of gene-regulation sequences in health and disease. In this study, we build upon our HuGX (\u27high-throughput human genes on the X chromosome’) strategy to expand our understanding of human gene regulation in vivo. Results In all, ten human genes known to express in therapeutically important brain regions were chosen for study. For eight of these genes, human bacterial artificial chromosome clones were identified, retrofitted with a reporter, knocked single-copy into the Hprt locus in mouse embryonic stem cells, and mouse strains derived. Five of these human genes expressed in mouse, and all expressed in the adult brain region for which they were chosen. This defined the boundaries of the genomic DNA sufficient for brain expression, and refined our knowledge regarding the complexity of gene regulation. We also characterized for the first time the expression of human MAOA and NR2F2, two genes for which the mouse homologs have been extensively studied in the central nervous system (CNS), and AMOTL1 and NOV, for which roles in CNS have been unclear. Conclusions We have demonstrated the use of the HuGX strategy to functionally delineate non-coding-regulatory regions of therapeutically important human brain genes. Our results also show that a careful investigation, using publicly available resources and bioinformatics, can lead to accurate predictions of gene expression

    ConTra v2: a tool to identify transcription factor binding sites across species, update 2011

    Get PDF
    Transcription factors are important gene regulators with distinctive roles in development, cell signaling and cell cycling, and they have been associated with many diseases. The ConTra v2 web server allows easy visualization and exploration of predicted transcription factor binding sites in any genomic region surrounding coding or non-coding genes. In this new version, users can choose from nine reference organisms ranging from human to yeast. ConTra v2 can analyze promoter regions, 5′-UTRs, 3′-UTRs and introns or any other genomic region of interest. Hundreds of position weight matrices are available to choose from, but the user can also upload any other matrices for detecting specific binding sites. A typical analysis is run in four simple steps of choosing the gene, the transcript, the region of interest and then selecting one or more transcription factor binding sites. The ConTra v2 web server is freely available at http://bioit.dmbr.ugent.be/contrav2/index.php

    ORegAnno: an open-access community-driven resource for regulatory annotation

    Get PDF
    ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the ‘publication queue’ allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or ‘check out’ papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org

    Identification of molecular compartments and genetic circuitry in the developing mammalian kidney

    Get PDF
    Lengthy developmental programs generate cell diversity within an organotypic framework, enabling the later physiological actions of each organ system. Cell identity, cell diversity and cell function are determined by cell type-specific transcriptional programs; consequently, transcriptional regulatory factors are useful markers of emerging cellular complexity, and their expression patterns provide insights into the regulatory mechanisms at play. We performed a comprehensive genome-scale in situ expression screen of 921 transcriptional regulators in the developing mammalian urogenital system. Focusing on the kidney, analysis of regional-specific expression patterns identified novel markers and cell types associated with development and patterning of the urinary system. Furthermore, promoter analysis of synexpressed genes predicts transcriptional control mechanisms that regulate cell differentiation. The annotated informational resource (www.gudmap.org) will facilitate functional analysis of the mammalian kidney and provides useful information for the generation of novel genetic tools to manipulate emerging cell populations
    corecore