27 research outputs found

    ORegAnno: an open-access community-driven resource for regulatory annotation

    Get PDF
    ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the ‘publication queue’ allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or ‘check out’ papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org

    PlantProm: a database of plant promoter sequences

    Get PDF
    PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DBcontains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions ( 200: þ51) with TSS on the fixed position þ201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support vector machine approach for promoter identification is under development

    SwissRegulon: a database of genome-wide annotations of regulatory sites

    Get PDF
    SwissRegulon () is a database containing genome-wide annotations of regulatory sites in the intergenic regions of genomes. The regulatory site annotations are produced using a number of recently developed algorithms that operate on multiple alignments of orthologous intergenic regions from related genomes in combination with, whenever available, known sites from the literature, and ChIP-on-chip binding data. Currently SwissRegulon contains annotations for yeast and 17 prokaryotic genomes. The database provides information about the sequence, location, orientation, posterior probability and, whenever available, binding factor of each annotated site. To enable easy viewing of the regulatory site annotations in the context of other features annotated on the genomes, the sites are displayed using the GBrowse genome browser interface and can be queried based on any annotated genomic feature. The database can also be queried for regulons, i.e. sites bound by a common factor

    The combination approach of SVM and ECOC for powerful identification and classification of transcription factor

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.</p> <p>Results</p> <p>The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).</p> <p>Conclusion</p> <p>The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.</p

    Transcriptomic response to differentiation induction

    Get PDF
    BACKGROUND: Microarrays used for gene expression studies yield large amounts of data. The processing of such data typically leads to lists of differentially-regulated genes. A common terminal data analysis step is to map pathways of potentially interrelated genes. METHODS: We applied a transcriptomics analysis tool to elucidate the underlying pathways of leukocyte maturation at the genomic level in an established cellular model of leukemia by examining time-course data in two subclones of U-937 cells. Leukemias such as Acute Promyelocytic Leukemia (APL) are characterized by a block in the hematopoietic stem cell maturation program at a point when expansion of clones which should be destined to mature into terminally-differentiated effector cells get locked into endless proliferation with few cells reaching maturation. Treatment with retinoic acid, depending on the precise genomic abnormality, often releases the responsible promyelocytes from this blockade but clinically can yield adverse sequellae in terms of potentially lethal side effects, referred to as retinoic acid syndrome. RESULTS: Briefly, the list of genes for temporal patterns of expression was pasted into the ABCC GRID Promoter TFSite Comparison Page website tool and the outputs for each pattern were examined for possible coordinated regulation by shared regelems (regulatory elements). We found it informative to use this novel web tool for identifying, on a genomic scale, genes regulated by drug treatment. CONCLUSION: Improvement is needed in understanding the nature of the mutations responsible for controlling the maturation process and how these genes regulate downstream effects if there is to be better targeting of chemical interventions. Expanded implementation of the techniques and results reported here may better direct future efforts to improve treatment for diseases not restricted to APL
    corecore