Search CORE

6,364 research outputs found

Information content based model for the topological properties of the gene regulatory network of Escherichia coli

Author: Albert
Alberts
Almirantis
Avery
Ayşe Erzan
Babu
Balcan
Balcan
Balcan
Banzhaf
Barabasi
Barabasi
Benos
Berg
Bergmann
Berkin Malkoç
Bilu
Bollobás
Browning
Buldyrev
Colizza
Colizza
Dawkins
Dawkins
Dobrin
Dodd
Dorogovtsev
Duygu Balcan
Erdös
Erdös
Gama-Castro
Gerland
Gershenzon
Guelzim
Harbison
Jeong
Kashtan
Kauffman
Kim
Koralov
Kugiumtzis
Li
Lynch
Ma
Matsumoto
Milo
Milo
Mungan
Münch
Okuda
O’Flanagan
Pachkov
Reil
Rudd
Salgado
Salgado
Samal
Sengun
Sengupta
Shannon
Shearwin
Sneppen
Spirin
Stormo
Teixeira
van Nimwegen
van Noort
Vazquez
Wagner
Warren
Watson
Wernicke
Zhou
Publication venue: 'Elsevier BV'
Publication date: 29/12/2009
Field of study

Gene regulatory networks (GRN) are being studied with increasingly precise quantitative tools and can provide a testing ground for ideas regarding the emergence and evolution of complex biological networks. We analyze the global statistical properties of the transcriptional regulatory network of the prokaryote Escherichia coli, identifying each operon with a node of the network. We propose a null model for this network using the content-based approach applied earlier to the eukaryote Saccharomyces cerevisiae. (Balcan et al., 2007) Random sequences that represent promoter regions and binding sequences are associated with the nodes. The length distributions of these sequences are extracted from the relevant databases. The network is constructed by testing for the occurrence of binding sequences within the promoter regions. The ensemble of emergent networks yields an exponentially decaying in-degree distribution and a putative power law dependence for the out-degree distribution with a flat tail, in agreement with the data. The clustering coefficient, degree-degree correlation, rich club coefficient and k-core visualization all agree qualitatively with the empirical network to an extent not yet achieved by any other computational model, to our knowledge. The significant statistical differences can point the way to further research into non-adaptive and adaptive processes in the evolution of the E. coli GRN.Comment: 58 pages, 3 tables, 22 figures. In press, Journal of Theoretical Biology (2009)

arXiv.org e-Print Archive

Crossref

Reconciliation between operational taxonomic units and species boundaries

Author: Boon Nico
Kerckhof Frederiek-Maarten
Leys Natalie
Monsieurs Pieter
Mysara Mohamed
Props Ruben
Raes Jeroen
Vandamme Peter
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

The development of high-throughput sequencing technologies has revolutionised the field of microbial ecology via 16S rRNA gene amplicon sequencing approaches. Clustering those amplicon sequencing reads into operational taxonomic units (OTUs) using a fixed cut-off is a commonly used approach to estimate microbial diversity. A 97% threshold was chosen with the intended purpose that resulting OTUs could be interpreted as a proxy for bacterial species. Our results show that the robustness of such a generalised cut-off is questionable when applied to short amplicons only covering one or two variable regions of the 16S rRNA gene. It will lead to biases in diversity metrics and makes it hard to compare results obtained with amplicons derived with different primer sets. The method introduced within this work takes into account the differential evolutional rates of taxonomic lineages in order to define a dynamic and taxonomic-dependent OTU clustering cut-off score. For a taxonomic family consisting of species showing high evolutionary conservation in the amplified variable regions, the cut-off will be more stringent than 97%. By taking into consideration the amplified variable regions and the taxonomic family when defining this cut-off, such a threshold will lead to more robust results and closer correspondence between OTUs and species. This approach has been implemented in a publicly available software package called DynamiC

Ghent University Academic Bibliography

Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions

Author: Campos-González Adrian I.
Freyre-González Julio A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/01/2019
Field of study

Genetic regulatory networks (GRNs) have been widely studied, yet there is a lack of understanding with regards to the final size and properties of these networks, mainly due to no network currently being complete. In this study, we analyzed the distribution of GRN structural properties across a large set of distinct prokaryotic organisms and found a set of constrained characteristics such as network density and number of regulators. Our results allowed us to estimate the number of interactions that complete networks would have, a valuable insight that could aid in the daunting task of network curation, prediction, and validation. Using state-of-the-art statistical approaches, we also provided new evidence to settle a previously stated controversy that raised the possibility of complete biological networks being random and therefore attributing the observed scale-free properties to an artifact emerging from the sampling process during network discovery. Furthermore, we identified a set of properties that enabled us to assess the consistency of the connectivity distribution for various GRNs against different alternative statistical distributions. Our results favor the hypothesis that highly connected nodes (hubs) are not a consequence of network incompleteness. Finally, an interaction coverage computed for the GRNs as a proxy for completeness revealed that high-throughput based reconstructions of GRNs could yield biased networks with a low average clustering coefficient, showing that classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio

arXiv.org e-Print Archive

Directory of Open Access Journals

University of Queensland eSpace

Recommended from our members

Clades of huge phages from across Earth's ecosystems.

Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems

eScholarship - University of California

Online Research Database In Technology

Recommended from our members

Cost effective, experimentally robust differential-expression analysis for human/mammalian, pathogen and dual-species transcriptomics.

Author: Bruno Vincent M
Chung Matthew
Dunning Hotopp Julie C
Filler Scott G
Fraser Claire M
Mahurkar Anup
Mattick John
McCracken Carrie
Rasko David A
Shetty Amol C
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

As sequencing read length has increased, researchers have quickly adopted longer reads for their experiments. Here, we examine 14 pathogen or host-pathogen differential gene expression data sets to assess whether using longer reads is warranted. A variety of data sets was used to assess what genomic attributes might affect the outcome of differential gene expression analysis including: gene density, operons, gene length, number of introns/exons and intron length. No genome attribute was found to influence the data in principal components analysis, hierarchical clustering with bootstrap support, or regression analyses of pairwise comparisons that were undertaken on the same reads, looking at all combinations of paired and unpaired reads trimmed to 36, 54, 72 and 101 bp. Read pairing had the greatest effect when there was little variation in the samples from different conditions or in their replicates (e.g. little differential gene expression). But overall, 54 and 72 bp reads were typically most similar. Given differences in costs and mapping percentages, we recommend 54 bp reads for organisms with no or few introns and 72 bp reads for all others. In a third of the data sets, read pairing had absolutely no effect, despite paired reads having twice as much data. Therefore, single-end reads seem robust for differential-expression analyses, but in eukaryotes paired-end reads are likely desired to analyse splice variants and should be preferred for data sets that are acquired with the intent to be community resources that might be used in secondary data analyses

eScholarship - University of California

Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering

Author: Altschul
Angly
Arthur L. Delcher
Balzer
Benson
Bo Liu
Borodovsky
Brady
Brulc
Chatterji
Chen
Costello
Curtis
David R. Kelley
Delcher
Delcher
Diaz
Dinsdale
Dohm
Fickett
Fleischmann
Handelsman
Hastie
Hoff
Hoff
Hu
Kelley
Kislyuk
Kristiansson
Lozupone
Majoros
Margulies
Mavromatis
Mihai Pop
Monzoorul Haque
Noguchi
Patil
Pruitt
Rho
Rocha
Rusch
Schatz
Schloss
Sharon
Shendure
Steven L. Salzberg
Tringe
Turnbaugh
Turnbaugh
Tyson
Venter
Whitman
Yok
Yooseph
Zhu
Publication venue: Oxford University Press
Publication date: 01/11/2013
Field of study

Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome. In this work, we developed a metagenomics gene prediction system Glimmer-MG that achieves significantly greater accuracy than previous systems via novel approaches to a number of important prediction subtasks. First, we introduce the use of phylogenetic classifications of the sequences to model parameterization. We also cluster the sequences, grouping together those that likely originated from the same organism. Analogous to iterative schemes that are useful for whole genomes, we retrain our models within each cluster on the initial gene predictions before making final predictions. Finally, we model both insertion/deletion and substitution sequencing errors using a different approach than previous software, allowing Glimmer-MG to change coding frame or pass through stop codons by predicting an error. In a comparison among multiple gene finding methods, Glimmer-MG makes the most sensitive and precise predictions on simulated and real metagenomes for all read lengths and error rates tested

Crossref

Harvard University - DASH

PubMed Central

Recovering complete and draft population genomes from metagenome datasets.

Author: Gilbert Jack A
Sangwan Naseer
Xia Fangfang
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

Woods Hole Open Access Server

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

Author: A.B. Simonson
B. Boussau
B. Snel
B. Snel
B.E. Dutilh
B.G. Mirkin
C. Pál
C.G. Kurland
D.H. Huson
E. Belda
E.A. Herniou
E.D. Green
E.J. Deeds
E.L.L. Sonnhammer
E.V. Koonin
F. Delsuc
F. Tekaia
G.D.P. Clarke
G.P. Karev
G.P. Karev
G.P. Karev
I.K. Jordan
J. Lin
J.A. Lake
J.O. Korbel
J.P. Gogarten
J.T. Herbeck
K.H. Wolfe
M. Csűrös
M. Pellegrini
M.G. Montague
M.W. Hahn
R.L. Tatusov
S. Karlin
S. Yang
S.T. Fitz-Gibbon
T. Pupko
V. Kunin
V. Kunin
W. Feller
W.J. Reed
X. Gu
Y. Boucher
Y.I. Wolf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/09/2005
Field of study

We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space, where N is the number of organisms,

h

is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Preoteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database

arXiv.org e-Print Archive

CiteSeerX

Crossref