17 research outputs found
Reconstruction of novel transcription factor regulons through inference of their binding sites
Background
In most sequenced organisms the number of known regulatory genes (e.g., transcription factors (TFs)) vastly exceeds the number of experimentally-verified regulons that could be associated with them. At present, identification of TF regulons is mostly done through comparative genomics approaches. Such methods could miss organism-specific regulatory interactions and often require expensive and time-consuming experimental techniques to generate the underlying data.
Results
In this work, we present an efficient algorithm that aims to identify a given transcription factorβs regulon through inference of its unknown binding sites, based on the discovery of its binding motif. The proposed approach relies on computational methods that utilize gene expression data sets and knockout fitness data sets which are available or may be straightforwardly obtained for many organisms. We computationally constructed the profiles of putative regulons for the TFs LexA, PurR and Fur in E. coli K12 and identified their binding motifs. Comparisons with an experimentally-verified database showed high recovery rates of the known regulon members, and indicated good predictions for the newly found genes with high biological significance. The proposed approach is also applicable to novel organisms for predicting unknown regulons of the transcriptional regulators. Results for the hypothetical protein D d e0289 in D. alaskensis include the discovery of a Fis-type TF binding motif.
Conclusions
The proposed motif-based regulon inference approach can discover the organism-specific regulatory interactions on a single genome, which may be missed by current comparative genomics techniques due to their limitations
Inference of gene regulatory networks from genome-wide knockout fitness data
Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A stateβspace model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrRβLiuR network in bacteria Shewanella oneidensis
Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites
Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBIβa sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/
Temperature Control of Fimbriation Circuit Switch in Uropathogenic Escherichia coli: Quantitative Analysis via Automated Model Abstraction
Uropathogenic Escherichia coli (UPEC) represent the predominant cause of urinary tract infections (UTIs). A key UPEC molecular virulence mechanism is type 1 fimbriae, whose expression is controlled by the orientation of an invertible chromosomal DNA elementβthe fim switch. Temperature has been shown to act as a major regulator of fim switching behavior and is overall an important indicator as well as functional feature of many urologic diseases, including UPEC host-pathogen interaction dynamics. Given this panoptic physiological role of temperature during UTI progression and notable empirical challenges to its direct in vivo studies, in silico modeling of corresponding biochemical and biophysical mechanisms essential to UPEC pathogenicity may significantly aid our understanding of the underlying disease processes. However, rigorous computational analysis of biological systems, such as fim switch temperature control circuit, has hereto presented a notoriously demanding problem due to both the substantial complexity of the gene regulatory networks involved as well as their often characteristically discrete and stochastic dynamics. To address these issues, we have developed an approach that enables automated multiscale abstraction of biological system descriptions based on reaction kinetics. Implemented as a computational tool, this method has allowed us to efficiently analyze the modular organization and behavior of the E. coli fimbriation switch circuit at different temperature settings, thus facilitating new insights into this mode of UPEC molecular virulence regulation. In particular, our results suggest that, with respect to its role in shutting down fimbriae expression, the primary function of FimB recombinase may be to effect a controlled down-regulation (rather than increase) of the ON-to-OFF fim switching rate via temperature-dependent suppression of competing dynamics mediated by recombinase FimE. Our computational analysis further implies that this down-regulation mechanism could be particularly significant inside the host environment, thus potentially contributing further understanding toward the development of novel therapeutic approaches to UPEC-caused UTIs
Nano-motion Dynamics are Determined by Surface-Tethered Selectin Mechanokinetics and Bond Formation
The interaction of proteins at cellular interfaces is critical for many biological processes, from intercellular signaling to cell adhesion. For example, the selectin family of adhesion receptors plays a critical role in trafficking during inflammation and immunosurveillance. Quantitative measurements of binding rates between surface-constrained proteins elicit insight into how molecular structural details and post-translational modifications contribute to function. However, nano-scale transport effects can obfuscate measurements in experimental assays. We constructed a biophysical simulation of the motion of a rigid microsphere coated with biomolecular adhesion receptors in shearing flow undergoing thermal motion. The simulation enabled in silico investigation of the effects of kinetic force dependence, molecular deformation, grouping adhesion receptors into clusters, surface-constrained bond formation, and nano-scale vertical transport on outputs that directly map to observable motions. Simulations recreated the jerky, discrete stop-and-go motions observed in P-selectin/PSGL-1 microbead assays with physiologic ligand densities. Motion statistics tied detailed simulated motion data to experimentally reported quantities. New deductions about biomolecular function for P-selectin/PSGL-1 interactions were made. Distributing adhesive forces among P-selectin/PSGL-1 molecules closely grouped in clusters was necessary to achieve bond lifetimes observed in microbead assays. Initial, capturing bond formation effectively occurred across the entire molecular contour length. However, subsequent rebinding events were enhanced by the reduced separation distance following the initial capture. The result demonstrates that vertical transport can contribute to an enhancement in the apparent bond formation rate. A detailed analysis of in silico motions prompted the proposition of wobble autocorrelation as an indicator of two-dimensional function. Insight into two-dimensional bond formation gained from flow cell assays might therefore be important to understand processes involving extended cellular interactions, such as immunological synapse formation. A biologically informative in silico system was created with minimal, high-confidence inputs. Incorporating random effects in surface separation through thermal motion enabled new deductions of the effects of surface-constrained biomolecular function. Important molecular information is embedded in the patterns and statistics of motion
Recommended from our members
Inference of gene regulatory networks from genome-wide knockout fitness data.
MotivationGenome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference.ResultsIn this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state-space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR-LiuR network in bacteria Shewanella oneidensis.AvailabilityMATLAB code and datasets are available to download at http://www.duke.edu/βΌlw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf
Stochastic bimodalities in deterministically monostable reversible chemical networks due to network topology reduction
Recently, stochastic simulations of networks of chemical reactions have shown distributions of steady states that are inconsistent with the steady state solutions of the corresponding deterministic ordinary differential equations. One such class of systems is comprised of networks that have irreversible reactions, and the origin of the anomalous behavior in these cases is understood to be due to the existence of absorbing states. More puzzling is the report of such anomalies in reaction networks without irreversible reactions. One such biologically important example is the futile cycle. Here we show that, in these systems, nonclassical behavior can originate from a stochastic elimination of all the molecules of a key species. This leads to a reduction in the topology of the network and the sampling of steady states corresponding to a truncated network. Surprisingly, we find that, in spite of the purely discrete character of the topology reduction mechanism revealed by βexactβ numerical solutions of the master equations, this phenomenon is reproduced by the corresponding FokkerβPlanck equations
Recommended from our members
Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites.
Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/