239 research outputs found

    BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

    Get PDF
    Based on mixture models, we present a Bayesian method (called BClass) to classify biological entities (e.g. genes) when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group) in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots) and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.

    Nonlinear software sensor for monitoring genetic regulation processes with noise and modeling errors

    Full text link
    Nonlinear control techniques by means of a software sensor that are commonly used in chemical engineering could be also applied to genetic regulation processes. We provide here a realistic formulation of this procedure by introducing an additive white Gaussian noise, which is usually found in experimental data. Besides, we include model errors, meaning that we assume we do not know the nonlinear regulation function of the process. In order to illustrate this procedure, we employ the Goodwin dynamics of the concentrations [B.C. Goodwin, Temporal Oscillations in Cells, (Academic Press, New York, 1963)] in the simple form recently applied to single gene systems and some operon cases [H. De Jong, J. Comp. Biol. 9, 67 (2002)], which involves the dynamics of the mRNA, given protein, and metabolite concentrations. Further, we present results for a three gene case in co-regulated sets of transcription units as they occur in prokaryotes. However, instead of considering their full dynamics, we use only the data of the metabolites and a designed software sensor. We also show, more generally, that it is possible to rebuild the complete set of nonmeasured concentrations despite the uncertainties in the regulation function or, even more, in the case of not knowing the mRNA dynamics. In addition, the rebuilding of concentrations is not affected by the perturbation due to the additive white Gaussian noise and also we managed to filter the noisy output of the biological systemComment: 21 pages, 7 figures; also selected in vjbio of August 2005; this version corrects a misorder in the last three references of the published versio

    DETERMINATION OF ANISOTROPY HTI BY SEISMIC INVERSION USING GENETIC ALGORITHM IN THE VALLE MEDIO OF MAGDALENA BASIN

    Get PDF
    A genetic algorithm was implemented in MATLAB to perform seismic inversion and determine HTI anisotropy, which was applied to a seismic project in the Middle Magdalena Valley basin in Colombia. The algorithm, which includes the concept of migration of populations (MP), was tested in an isotropic condition (parallel to the fractures), and in a condition of anisotropy (perpendicular to fractures), showing better performance in the testing phase with real data. The presence of weak anisotropy is determined in the study area with associated fractures oriented in azimuth of 45 °, being perpendicular to the current stress regime. This result would explain the weak anisotropy related to fractures associated with a paleo-stress regime.Se implementó un algoritmo genético en MATLAB para realizar inversión sísmica y determinar anisotropía HTI, el cual se aplicó a un proyecto sísmico en la cuenca del Valle Medio del Magdalena en Colombia. El algoritmo que incluye el concepto de migración de poblaciones (MP), fue probado en condición de isotropía (paralelo a las fracturas) y anisotropía (perpendicular a las fracturas), mostrando mejor desempeño en la fase de prueba con datos reales. Se determinó la presencia de anisotropía débil en la zona de estudio con asociación a fracturas con acimut de 45°, perpendicular al régimen de esfuerzo actual en la zona, lo que explicaría la anisotropía débil relacionada con fracturas asociado a un régimen de paleo-esfuerzos

    DETERMINATION OF ANISOTROPY HTI BY SEISMIC INVERSION USING GENETIC ALGORITHM IN THE VALLE MEDIO OF MAGDALENA BASIN

    Get PDF
    A genetic algorithm was implemented in MATLAB to perform seismic inversion and determine HTI anisotropy, which was applied to a seismic project in the Middle Magdalena Valley basin in Colombia. The algorithm, which includes the concept of migration of populations (MP), was tested in an isotropic condition (parallel to the fractures), and in a condition of anisotropy (perpendicular to fractures), showing better performance in the testing phase with real data. The presence of weak anisotropy is determined in the study area with associated fractures oriented in azimuth of 45 °, being perpendicular to the current stress regime. This result would explain the weak anisotropy related to fractures associated with a paleo-stress regime.Se implementó un algoritmo genético en MATLAB para realizar inversión sísmica y determinar anisotropía HTI, el cual se aplicó a un proyecto sísmico en la cuenca del Valle Medio del Magdalena en Colombia. El algoritmo que incluye el concepto de migración de poblaciones (MP), fue probado en condición de isotropía (paralelo a las fracturas) y anisotropía (perpendicular a las fracturas), mostrando mejor desempeño en la fase de prueba con datos reales. Se determinó la presencia de anisotropía débil en la zona de estudio con asociación a fracturas con acimut de 45°, perpendicular al régimen de esfuerzo actual en la zona, lo que explicaría la anisotropía débil relacionada con fracturas asociado a un régimen de paleo-esfuerzos

    Programming gene expression with combinatorial promoters

    Get PDF
    Promoters control the expression of genes in response to one or more transcription factors (TFs). The architecture of a promoter is the arrangement and type of binding sites within it. To understand natural genetic circuits and to design promoters for synthetic biology, it is essential to understand the relationship between promoter function and architecture. We constructed a combinatorial library of random promoter architectures. We characterized 288 promoters in Escherichia coli, each containing up to three inputs from four different TFs. The library design allowed for multiple −10 and −35 boxes, and we observed varied promoter strength over five decades. To further analyze the functional repertoire, we defined a representation of promoter function in terms of regulatory range, logic type, and symmetry. Using these results, we identified heuristic rules for programming gene expression with combinatorial promoters

    Graph grammars with string-regulated rewriting

    Get PDF
    Multicellular organisms undergo a complex developmental process, orchestrated by the genetic information in their cells, in order to form a newborn individual from a fertilized egg. This complex process, not completely understood yet, is believed to have a key role in generating the impressive biotic diversity of organisms found on earth. Inspired by mechanisms of Eukaryotic genetic expression, we propose and analyse graph grammars with string-regulated rewriting. In these grammatical systems a genome sequence is represented by a regulatory string, a graph corresponds to an organism, and a set of graph grammar rules represents different forms of implementing cell division. Accordingly, a graph derivation by the graph grammar resembles the developmental process of an organism. We give examples of the concept and compare its generative power to the power of the traditional context-free graph grammars. We demonstrate that the power of expression increases when genetic regulation is included in the model, as compared to non-regulated grammars. Additionally, we propose a hierarchy of string-regulated graph grammars, arranged by expressive power. These results highlight the key role that the transmission of regulatory information during development has in the emergence of biological diversity.D.L. was supported in part by a research stay fellowship at Otto-von-Guericke-Universität Magdeburg from the Spanish Ministerio de Educación

    Considering Intra-individual Genetic Heterogeneity to Understand Biodiversity

    Get PDF
    In this chapter, I am concerned with the concept of Intra-individual Genetic Hetereogeneity (IGH) and its potential influence on biodiversity estimates. Definitions of biological individuality are often indirectly dependent on genetic sampling -and vice versa. Genetic sampling typically focuses on a particular locus or set of loci, found in the the mitochondrial, chloroplast or nuclear genome. If ecological function or evolutionary individuality can be defined on the level of multiple divergent genomes, as I shall argue is the case in IGH, our current genetic sampling strategies and analytic approaches may miss out on relevant biodiversity. Now that more and more examples of IGH are available, it is becoming possible to investigate the positive and negative effects of IGH on the functioning and evolution of multicellular individuals more systematically. I consider some examples and argue that studying diversity through the lens of IGH facilitates thinking not in terms of units, but in terms of interactions between biological entities. This, in turn, enables a fresh take on the ecological and evolutionary significance of biological diversity
    corecore