816 research outputs found
STRING and STITCH: known and predicted interactions between proteins and chemicals
Information on protein-protein and protein-chemical interactions is essential for understanding cellular functions. The STRING and STITCH web resources integrate interaction evidence derived from pathways, automatic literature mining, primary experimental data, and genomic context. The resulting interaction networks cover 1.5 million proteins from 373 organisms and 68,000 chemicals
Inferring Correlation Networks from Genomic Survey Data
High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.United States. Dept. of Energy (Contract DE-AC02-05CH11231
Revision of the Mediterranean and southern African Triglochin bulbosa complex (Juncaginaceae)
The Triglochin bulbosa complex (Juncaginaceae) from the Mediterranean region and Africa is revised. One new species, Triglochin buchenaui Kocke, Mering & Kadereit, and two new subspecies, Triglochin bulbosa subsp. calcicola Mering, Kocke & Kadereit and Triglochin bulbosa subsp. quarcicola Mering, Kocke & Kadereit, are described from South Africa. The only two Mediterranean taxa in the complex (Triglochin barrelieri, T. laxiflora) are elevated to species rank. Altogether seven species and four subspecies are recognised: Triglochin barrelieri, T. buchenaui, T. bulbosa subsp. bulbosa, T. bulbosa subsp. calcicola, T. bulbosa subsp. quarcicola, T. bulbosa subsp. tenuifolia, T. compacta, T. elongata, T. laxiflora and T. milnei. An identification key, detailed descriptions and accounts of the ecology and distribution of the taxa are provided. An IUCN conservation status is proposed for each taxon
STITCH: interaction networks of chemicals and proteins
The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH (‘search tool for interactions of chemicals’) integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug–target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database. STITCH is available at http://stitch.embl.de
Reproducible Propagation of Species-Rich Soil Bacterial Communities Suggests Robust Underlying Deterministic Principles of Community Formation.
Microbiomes are typically characterized by high species diversity but it is poorly understood how such system-level complexity can be generated and propagated. Here, we used soil microcosms as a model to study development of bacterial communities as a function of their starting complexity and environmental boundary conditions. Despite inherent stochastic variation in manipulating species-rich communities, both laboratory-mixed medium complexity (21 soil bacterial isolates in equal proportions) and high-diversity natural top-soil communities followed highly reproducible succession paths, maintaining 16S rRNA gene amplicon signatures prominent for known soil communities in general. Development trajectories and compositional states were different for communities propagated in soil microcosms than in liquid suspension. Compositional states were maintained over multiple renewed growth cycles but could be diverged by short-term pollutant exposure. The different but robust trajectories demonstrated that deterministic taxa-inherent characteristics underlie reproducible development and self-organized complexity of soil microbiomes within their environmental boundary conditions. Our findings also have direct implications for potential strategies to achieve controlled restoration of desertified land. IMPORTANCE There is now a great awareness of the high diversity of most environmental ("free-living") and host-associated microbiomes, but exactly how diverse microbial communities form and maintain is still highly debated. A variety of theories have been put forward, but testing them has been problematic because most studies have been based on synthetic communities that fail to accurately mimic the natural composition (i.e., the species used are typically not found together in the same environment), the diversity (usually too low to be representative), or the environmental system itself (using designs with single carbon sources or solely mixed liquid cultures). In this study, we show how species-diverse soil bacterial communities can reproducibly be generated, propagated, and maintained, either from individual isolates (21 soil bacterial strains) or from natural microbial mixtures washed from top-soil. The high replicate consistency we achieve both in terms of species compositions and developmental trajectories demonstrates the strong inherent deterministic factors driving community formation from their species composition. Generating complex soil microbiomes may provide ways for restoration of damaged soils that are prevalent on our planet
Prediction of effective genome size in metagenomic samples
We introduce a novel computational approach to predict effective genome size (EGS; a measure that includes multiple plasmid copies, inserted sequences, and associated phages and viruses) from short sequencing reads of environmental genomics (or metagenomics) projects. We observe considerable EGS differences between environments and link this with ecologic complexity as well as species composition (for instance, the presence of eukaryotes). For example, we estimate EGS in a complex, organism-dense farm soil sample at about 6.3 megabases (Mb) whereas that of the bacteria therein is only 4.7 Mb; for bacteria in a nutrient-poor, organism-sparse ocean surface water sample, EGS is as low as 1.6 Mb. The method also permits evaluation of completion status and assembly bias in single-genome sequencing projects
eggNOG: automated construction and annotation of orthologous groups of genes
The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database ('evolutionary genealogy of genes: Non-supervised Orthologous Groups'), which contains orthologous groups constructed from Smith-Waterman alignments through identification of reciprocal best matches and triangular linkage clustering. Applying this procedure to 312 bacterial, 26 archaeal and 35 eukaryotic genomes yielded 43 582 course-grained orthologous groups of which 9724 are extended versions of those from the original COG/KOG database. We also constructed more fine-grained groups for selected subsets of organisms, such as the 19 914 mammalian orthologous groups. We automatically annotated our non-supervised orthologous groups with functional descriptions, which were derived by identifying common denominators for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains. The orthologous groups in eggNOG contain 1 241 751 genes and provide at least a broad functional description for 77% of them. Users can query the resource for individual genes via a web interface or download the complete set of orthologous groups at http://eggnog.embl.d
Toward automatic reconstruction of a highly resolved tree of life
Contains fulltext :
51078.pdf (publisher's version ) (Closed access)We have developed an automatable procedure for reconstructing the tree of life with branch lengths comparable across all three domains. The tree has its basis in a concatenation of 31 orthologs occurring in 191 species with sequenced genomes. It revealed interdomain discrepancies in taxonomic classification. Systematic detection and subsequent exclusion of products of horizontal gene transfer increased phylogenetic resolution, allowing us to confirm accepted relationships and resolve disputed and preliminary classifications. For example, we place the phylum Acidobacteria as a sister group of delta-Proteobacteria, support a Gram-positive origin of Bacteria, and suggest a thermophilic last universal common ancestor
Duplication-divergence model of protein interaction network
We show that the protein-protein interaction networks can be surprisingly
well described by a very simple evolution model of duplication and divergence.
The model exhibits a remarkably rich behavior depending on a single parameter,
the probability to retain a duplicated link during divergence. When this
parameter is large, the network growth is not self-averaging and an average
vertex degree increases algebraically. The lack of self-averaging results in a
great diversity of networks grown out of the same initial condition. For small
values of the link retention probability, the growth is self-averaging, the
average degree increases very slowly or tends to a constant, and a degree
distribution has a power-law tail.Comment: 8 pages, 13 figure
- …