168 research outputs found

    BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

    Get PDF
    Based on mixture models, we present a Bayesian method (called BClass) to classify biological entities (e.g. genes) when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group) in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots) and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.

    Programming gene expression with combinatorial promoters

    Get PDF
    Promoters control the expression of genes in response to one or more transcription factors (TFs). The architecture of a promoter is the arrangement and type of binding sites within it. To understand natural genetic circuits and to design promoters for synthetic biology, it is essential to understand the relationship between promoter function and architecture. We constructed a combinatorial library of random promoter architectures. We characterized 288 promoters in Escherichia coli, each containing up to three inputs from four different TFs. The library design allowed for multiple −10 and −35 boxes, and we observed varied promoter strength over five decades. To further analyze the functional repertoire, we defined a representation of promoter function in terms of regulatory range, logic type, and symmetry. Using these results, we identified heuristic rules for programming gene expression with combinatorial promoters

    Nonlinear software sensor for monitoring genetic regulation processes with noise and modeling errors

    Full text link
    Nonlinear control techniques by means of a software sensor that are commonly used in chemical engineering could be also applied to genetic regulation processes. We provide here a realistic formulation of this procedure by introducing an additive white Gaussian noise, which is usually found in experimental data. Besides, we include model errors, meaning that we assume we do not know the nonlinear regulation function of the process. In order to illustrate this procedure, we employ the Goodwin dynamics of the concentrations [B.C. Goodwin, Temporal Oscillations in Cells, (Academic Press, New York, 1963)] in the simple form recently applied to single gene systems and some operon cases [H. De Jong, J. Comp. Biol. 9, 67 (2002)], which involves the dynamics of the mRNA, given protein, and metabolite concentrations. Further, we present results for a three gene case in co-regulated sets of transcription units as they occur in prokaryotes. However, instead of considering their full dynamics, we use only the data of the metabolites and a designed software sensor. We also show, more generally, that it is possible to rebuild the complete set of nonmeasured concentrations despite the uncertainties in the regulation function or, even more, in the case of not knowing the mRNA dynamics. In addition, the rebuilding of concentrations is not affected by the perturbation due to the additive white Gaussian noise and also we managed to filter the noisy output of the biological systemComment: 21 pages, 7 figures; also selected in vjbio of August 2005; this version corrects a misorder in the last three references of the published versio

    Graph grammars with string-regulated rewriting

    Get PDF
    Multicellular organisms undergo a complex developmental process, orchestrated by the genetic information in their cells, in order to form a newborn individual from a fertilized egg. This complex process, not completely understood yet, is believed to have a key role in generating the impressive biotic diversity of organisms found on earth. Inspired by mechanisms of Eukaryotic genetic expression, we propose and analyse graph grammars with string-regulated rewriting. In these grammatical systems a genome sequence is represented by a regulatory string, a graph corresponds to an organism, and a set of graph grammar rules represents different forms of implementing cell division. Accordingly, a graph derivation by the graph grammar resembles the developmental process of an organism. We give examples of the concept and compare its generative power to the power of the traditional context-free graph grammars. We demonstrate that the power of expression increases when genetic regulation is included in the model, as compared to non-regulated grammars. Additionally, we propose a hierarchy of string-regulated graph grammars, arranged by expressive power. These results highlight the key role that the transmission of regulatory information during development has in the emergence of biological diversity.D.L. was supported in part by a research stay fellowship at Otto-von-Guericke-Universität Magdeburg from the Spanish Ministerio de Educación

    COLOMBOS v3.0 : leveraging gene expression compendia for cross-species analyses

    Get PDF
    COLOMBOS is a database that integrates publicly available transcriptomics data for several prokaryotic model organisms. Compared to the previous version it has more than doubled in size, both in terms of species and data available. The manually curated condition annotation has been overhauled as well, giving more complete information about samples' experimental conditions and their differences. Functionality-wise cross-species analyses now enable users to analyse expression data for all species simultaneously, and identify candidate genes with evolutionary conserved expression behaviour. All the expression-based query tools have undergone a substantial improvement, overcoming the limit of enforced co-expression data retrieval and instead enabling the return of more complex patterns of expression behaviour. COLOMBOS is freely available through a web application at http://colombos.net/. The complete database is also accessible via REST API or downloadable as tab-delimited text files

    EcoCyc: A comprehensive view of Escherichia coli biology

    Get PDF
    EcoCyc (http://EcoCyc.org) provides a comprehensive encyclopedia of Escherichia coli biology. EcoCyc integrates information about the genome, genes and gene products; the metabolic network; and the regulatory network of E. coli. Recent EcoCyc developments include a new initiative to represent and curate all types of E. coli regulatory processes such as attenuation and regulation by small RNAs. EcoCyc has started to curate Gene Ontology (GO) terms for E. coli and has made a dataset of E. coli GO terms available through the GO Web site. The curation and visualization of electron transfer processes has been significantly improved. Other software and Web site enhancements include the addition of tracks to the EcoCyc genome browser, in particular a type of track designed for the display of ChIP-chip datasets, and the development of a comparative genome browser. A new Genome Omics Viewer enables users to paint omics datasets onto the full E. coli genome for analysis. A new advanced query page guides users in interactively constructing complex database queries against EcoCyc. A Macintosh version of EcoCyc is now available. A series of Webinars is available to instruct users in the use of EcoCyc

    EcoCyc: a comprehensive database of Escherichia coli biology

    Get PDF
    EcoCyc (http://EcoCyc.org) is a comprehensive model organism database for Escherichia coli K-12 MG1655. From the scientific literature, EcoCyc captures the functions of individual E. coli gene products; their regulation at the transcriptional, post-transcriptional and protein level; and their organization into operons, complexes and pathways. EcoCyc users can search and browse the information in multiple ways. Recent improvements to the EcoCyc Web interface include combined gene/protein pages and a Regulation Summary Diagram displaying a graphical overview of all known regulatory inputs to gene expression and protein activity. The graphical representation of signal transduction pathways has been updated, and the cellular and regulatory overviews were enhanced with new functionality. A specialized undergraduate teaching resource using EcoCyc is being developed

    Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in <it>Escherichia coli </it>K-12.</p> <p>Results</p> <p>Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners.</p> <p>Conclusion</p> <p>Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.</p

    Considering Intra-individual Genetic Heterogeneity to Understand Biodiversity

    Get PDF
    In this chapter, I am concerned with the concept of Intra-individual Genetic Hetereogeneity (IGH) and its potential influence on biodiversity estimates. Definitions of biological individuality are often indirectly dependent on genetic sampling -and vice versa. Genetic sampling typically focuses on a particular locus or set of loci, found in the the mitochondrial, chloroplast or nuclear genome. If ecological function or evolutionary individuality can be defined on the level of multiple divergent genomes, as I shall argue is the case in IGH, our current genetic sampling strategies and analytic approaches may miss out on relevant biodiversity. Now that more and more examples of IGH are available, it is becoming possible to investigate the positive and negative effects of IGH on the functioning and evolution of multicellular individuals more systematically. I consider some examples and argue that studying diversity through the lens of IGH facilitates thinking not in terms of units, but in terms of interactions between biological entities. This, in turn, enables a fresh take on the ecological and evolutionary significance of biological diversity
    corecore