1,322 research outputs found

    The Use of Functional Genomics in Synthetic Promoter Design

    Get PDF

    A Computational Pipeline for High- Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes

    Get PDF
    Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair–level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth

    Genome Biol.

    No full text
    With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome

    BiologicalNetworks 2.0 - an integrative view of genome biology data

    Get PDF
    Abstract Background A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems. Results Here we report on a new version of BiologicalNetworks, a research environment for the integral visualization and analysis of heterogeneous biological data. BiologicalNetworks can be queried for properties of thousands of different types of biological entities (genes/proteins, promoters, COGs, pathways, binding sites, and other) and their relations (interactions, co-expression, co-citations, and other). The system includes the build-pathways infrastructure for molecular interactions/relations and module discovery in high-throughput experiments. Also implemented in BiologicalNetworks are the Integrated Genome Viewer and Comparative Genomics Browser applications, which allow for the search and analysis of gene regulatory regions and their conservation in multiple species in conjunction with molecular pathways/networks, experimental data and functional annotations. Conclusions The new release of BiologicalNetworks together with its back-end database introduces extensive functionality for a more efficient integrated multi-level analysis of microarray, sequence, regulatory, and other data. BiologicalNetworks is freely available at http://www.biologicalnetworks.org

    Creating, Modeling, and Visualizing Metabolic Networks

    Get PDF
    Metabolic networks combine metabolism and regulation. These complex networks are difficult to understand and create due to the diverse types of information that need to be represented. This chapter describes a suite of interlinked tools for developing, displaying, and modeling metabolic networks. The metabolic network interactions database, MetNetDB, contains information on regulatory and metabolic interactions derived from a combination of web databases and input from biologists in their area of expertise. PathBinderA mines the biological “literaturome” by searching for new interactions or supporting evidence for existing interactions in metabolic networks. Sentences from abstracts are ranked in terms of the likelihood that an interaction is described and combined with evidence provided by other sentences. FCModeler, a publicly available software package, enables the biologist to visualize and model metabolic and regulatory network maps. FCModeler aids in the development and evaluation of hypotheses, and provides a modeling framework for assessing the large amounts of data captured by high-throughput gene expression experiments

    Computational identification of transcriptional regulatory elements in DNA sequence

    Get PDF
    Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges

    Quantitative proteogenomics of human pathogens using DIA-MS.

    Get PDF
    The increasing number of bacterial genomes in combination with reproducible quantitative proteome measurements provides new opportunities to explore how genetic differences modulate proteome composition and virulence. It is challenging to combine genome and proteome data as the underlying genome influences the proteome. We present a strategy to facilitate the integration of genome data from several genetically similar bacterial strains with data-independent analysis mass spectrometry (DIA-MS) for rapid interrogation of the combined data sets. The strategy relies on the construction of a composite genome combining all genetic data in a compact format, which can accommodate the fusion with quantitative peptide and protein information determined via DIA-MS. We demonstrate the method by combining data sets from whole genome sequencing, shotgun MS and DIA-MS from 34 clinical isolates of Streptococcus pyogenes. The data structure allows for fast exploration of the data showing that undetected proteins are on average more amenable to amino acid substitution than expressed proteins. We identified several significantly differentially expressed proteins between invasive and non-invasive strains. The work underlines how integration of whole genome sequencing with accurately quantified proteomes can further advance the interpretation of the relationship between genomes, proteomes and virulence

    Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo

    Get PDF
    BACKGROUND: Regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence are used with remarkable success in the annotation of genomes, the development of computational methods to analyze noncoding regions and to delineate transcriptional control elements is still in its infancy. RESULTS: Here we present novel algorithms to detect cis-regulatory modules through genome wide scans for clusters of transcription factor binding sites using three levels of prior information. When binding sites for the factors are known, our statistical segmentation algorithm, Ahab, yields about 150 putative gap gene regulated modules, with no adjustable parameters other than a window size. If one or more related modules are known, but no binding sites, repeated motifs can be found by a customized Gibbs sampler and input to Ahab, to predict genes with similar regulation. Finally using only the genome, we developed a third algorithm, Argos, that counts and scores clusters of overrepresented motifs in a window of sequence. Argos recovers many of the known modules, upstream of the segmentation genes, with no training data. CONCLUSIONS: We have demonstrated, in the case of body patterning in the Drosophila embryo, that our algorithms allow the genome-wide identification of regulatory modules. We believe that Ahab overcomes many problems of recent approaches and we estimated the false positive rate to be about 50%. Argos is the first successful attempt to predict regulatory modules using only the genome without training data. Complete results and module predictions across the Drosophila genome are available at http://uqbar.rockefeller.edu/~siggia/

    Assessing Computational Methods of Cis-Regulatory Module Prediction

    Get PDF
    Computational methods attempting to identify instances of cis-regulatory modules (CRMs) in the genome face a challenging problem of searching for potentially interacting transcription factor binding sites while knowledge of the specific interactions involved remains limited. Without a comprehensive comparison of their performance, the reliability and accuracy of these tools remains unclear. Faced with a large number of different tools that address this problem, we summarized and categorized them based on search strategy and input data requirements. Twelve representative methods were chosen and applied to predict CRMs from the Drosophila CRM database REDfly, and across the human ENCODE regions. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. When discriminating CRMs from non-coding regions, those methods considering evolutionary conservation have a stronger predictive power than methods designed to be run on a single genome. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. For example, some favour homotypical clusters of binding sites, while others perform best on short CRMs. Furthermore, most methods appear to be sensitive to the composition and structure of the genome to which they are applied. We analyze the principal features that distinguish the methods that performed well, identify weaknesses leading to poor performance, and provide a guide for users. We also propose key considerations for the development and evaluation of future CRM-prediction methods

    A new data mining approach for the detection of bacterial promoters combining stochastic and combinatorial methods

    Get PDF
    International audienceWe present a new data mining method based on stochastic analysis (HMM for Hidden Markov Model) and combinatorial methods for discovering new transcriptional factors in bacterial genome sequences. Sigma factor binding sites (SFBSs) were described as patterns of box1 - spacer - box2 corresponding to the -35 and -10 DNA motifs of bacterial promoters. We used a high-order Hidden Markov Model in which the hidden process is a second-order Markov chain. Applied on the genome of the model bacterium Streptomyces coelicolor (2), the a posteriori state probabilities revealed local maxima or peaks whose distribution was enriched in the intergenic sequences (``iPeaks'' for intergenic peaks). Short DNA sequences underlying the iPeaks were extracted and clustered by a hierarchical classification algorithm based on the SmithWaterman local similarity. Some selected motif consensuses were used as box1 (-35 motif) in the search of a potential neighbouring box2 (-10 motif) using a word enumeration algorithm. This new SFBS mining methodology applied on Streptomyces coelicolor was successful to retrieve already known SFBSs and to suggest new potential transcriptional factor binding sites (TFBSs). The well defined SigR regulon (oxidative stress response) was also used as a test quorum to compare first and second-order HMM. Our approach also allowed the preliminary detection of known SFBSs in Bacillus subtilis
    • …
    corecore