6,364 research outputs found
Information content based model for the topological properties of the gene regulatory network of Escherichia coli
Gene regulatory networks (GRN) are being studied with increasingly precise
quantitative tools and can provide a testing ground for ideas regarding the
emergence and evolution of complex biological networks. We analyze the global
statistical properties of the transcriptional regulatory network of the
prokaryote Escherichia coli, identifying each operon with a node of the
network. We propose a null model for this network using the content-based
approach applied earlier to the eukaryote Saccharomyces cerevisiae. (Balcan et
al., 2007) Random sequences that represent promoter regions and binding
sequences are associated with the nodes. The length distributions of these
sequences are extracted from the relevant databases. The network is constructed
by testing for the occurrence of binding sequences within the promoter regions.
The ensemble of emergent networks yields an exponentially decaying in-degree
distribution and a putative power law dependence for the out-degree
distribution with a flat tail, in agreement with the data. The clustering
coefficient, degree-degree correlation, rich club coefficient and k-core
visualization all agree qualitatively with the empirical network to an extent
not yet achieved by any other computational model, to our knowledge. The
significant statistical differences can point the way to further research into
non-adaptive and adaptive processes in the evolution of the E. coli GRN.Comment: 58 pages, 3 tables, 22 figures. In press, Journal of Theoretical
Biology (2009)
Reconciliation between operational taxonomic units and species boundaries
The development of high-throughput sequencing technologies has revolutionised the field of microbial ecology via 16S rRNA gene amplicon sequencing approaches. Clustering those amplicon sequencing reads into operational taxonomic units (OTUs) using a fixed cut-off is a commonly used approach to estimate microbial diversity. A 97% threshold was chosen with the intended purpose that resulting OTUs could be interpreted as a proxy for bacterial species. Our results show that the robustness of such a generalised cut-off is questionable when applied to short amplicons only covering one or two variable regions of the 16S rRNA gene. It will lead to biases in diversity metrics and makes it hard to compare results obtained with amplicons derived with different primer sets. The method introduced within this work takes into account the differential evolutional rates of taxonomic lineages in order to define a dynamic and taxonomic-dependent OTU clustering cut-off score. For a taxonomic family consisting of species showing high evolutionary conservation in the amplified variable regions, the cut-off will be more stringent than 97%. By taking into consideration the amplified variable regions and the taxonomic family when defining this cut-off, such a threshold will lead to more robust results and closer correspondence between OTUs and species. This approach has been implemented in a publicly available software package called DynamiC
Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions
Genetic regulatory networks (GRNs) have been widely studied, yet there is a
lack of understanding with regards to the final size and properties of these
networks, mainly due to no network currently being complete. In this study, we
analyzed the distribution of GRN structural properties across a large set of
distinct prokaryotic organisms and found a set of constrained characteristics
such as network density and number of regulators. Our results allowed us to
estimate the number of interactions that complete networks would have, a
valuable insight that could aid in the daunting task of network curation,
prediction, and validation. Using state-of-the-art statistical approaches, we
also provided new evidence to settle a previously stated controversy that
raised the possibility of complete biological networks being random and
therefore attributing the observed scale-free properties to an artifact
emerging from the sampling process during network discovery. Furthermore, we
identified a set of properties that enabled us to assess the consistency of the
connectivity distribution for various GRNs against different alternative
statistical distributions. Our results favor the hypothesis that highly
connected nodes (hubs) are not a consequence of network incompleteness.
Finally, an interaction coverage computed for the GRNs as a proxy for
completeness revealed that high-throughput based reconstructions of GRNs could
yield biased networks with a low average clustering coefficient, showing that
classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio
Recommended from our members
Clades of huge phages from across Earth's ecosystems.
Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems
Recommended from our members
Cost effective, experimentally robust differential-expression analysis for human/mammalian, pathogen and dual-species transcriptomics.
As sequencing read length has increased, researchers have quickly adopted longer reads for their experiments. Here, we examine 14 pathogen or host-pathogen differential gene expression data sets to assess whether using longer reads is warranted. A variety of data sets was used to assess what genomic attributes might affect the outcome of differential gene expression analysis including: gene density, operons, gene length, number of introns/exons and intron length. No genome attribute was found to influence the data in principal components analysis, hierarchical clustering with bootstrap support, or regression analyses of pairwise comparisons that were undertaken on the same reads, looking at all combinations of paired and unpaired reads trimmed to 36, 54, 72 and 101 bp. Read pairing had the greatest effect when there was little variation in the samples from different conditions or in their replicates (e.g. little differential gene expression). But overall, 54 and 72 bp reads were typically most similar. Given differences in costs and mapping percentages, we recommend 54 bp reads for organisms with no or few introns and 72 bp reads for all others. In a third of the data sets, read pairing had absolutely no effect, despite paired reads having twice as much data. Therefore, single-end reads seem robust for differential-expression analyses, but in eukaryotes paired-end reads are likely desired to analyse splice variants and should be preferred for data sets that are acquired with the intent to be community resources that might be used in secondary data analyses
Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering
Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome. In this work, we developed a metagenomics gene prediction system Glimmer-MG that achieves significantly greater accuracy than previous systems via novel approaches to a number of important prediction subtasks. First, we introduce the use of phylogenetic classifications of the sequences to model parameterization. We also cluster the sequences, grouping together those that likely originated from the same organism. Analogous to iterative schemes that are useful for whole genomes, we retrain our models within each cluster on the initial gene predictions before making final predictions. Finally, we model both insertion/deletion and substitution sequencing errors using a different approach than previous software, allowing Glimmer-MG to change coding frame or pass through stop codons by predicting an error. In a comparison among multiple gene finding methods, Glimmer-MG makes the most sensitive and precise predictions on simulated and real metagenomes for all read lengths and error rates tested
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer
We introduce a Markov model for the evolution of a gene family along a
phylogeny. The model includes parameters for the rates of horizontal gene
transfer, gene duplication, and gene loss, in addition to branch lengths in the
phylogeny. The likelihood for the changes in the size of a gene family across
different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space,
where N is the number of organisms, is the height of the phylogeny, and M
is the sum of family sizes. We apply the model to the evolution of gene content
in Preoteobacteria using the gene families in the COG (Clusters of Orthologous
Groups) database
- …