44 research outputs found

    Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Taxa counting is a major problem faced by analysis of metagenomic data. The most popular method relies on analysis of 16S rRNA sequences, but some studies employ also protein based analyses. It would be advantageous to have a method that is applicable directly to short sequences, of the kind extracted from samples in modern metagenomic research. This is achieved by the technique proposed here.</p> <p>Results</p> <p>We employ specific peptides, deduced from aminoacyl tRNA synthetases, as markers for the occurrence of single genes in data. Sequences carrying these markers are aligned and compared with each other to provide a lower limit for taxa counts in metagenomic data. The method is compared with 16S rRNA searches on a set of known genomes. The taxa counting problem is analyzed mathematically and a heuristic algorithm is proposed. When applied to genomic contigs of a recent human gut microbiome study, the taxa counting method provides information on numbers of different species and strains. We then apply our method to short read data and demonstrate how it can be calibrated to cope with errors. Comparison to known databases leads to estimates of the percentage of novelties, and the type of phyla involved.</p> <p>Conclusions</p> <p>A major advantage of our method is its simplicity: it relies on searching sequences for the occurrence of just 4000 specific peptides belonging to the S61 subgroup of aaRS enzymes. When compared to other methods, it provides additional insight into the taxonomic contents of metagenomic data. Furthermore, it can be directly applied to short read data, avoiding the need for genomic contig reconstruction, and taking into account short reads that are otherwise discarded as singletons. Hence it is very suitable for a fast analysis of next generation sequencing data.</p

    Deriving enzymatic and taxonomic signatures of metagenomes from short read data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We propose a method for deriving enzymatic signatures from short read metagenomic data of unknown species. The short read data are converted to six pseudo-peptide candidates. We search for occurrences of Specific Peptides (SPs) on the latter. SPs are peptides that are indicative of enzymatic function as defined by the Enzyme Commission (EC) nomenclature. The number of SP hits on an ensemble of short reads is counted and then converted to estimates of numbers of enzymatic genes associated with different EC categories in the studied metagenome. Relative amounts of different EC categories define the enzymatic spectrum, without the need to perform genomic assemblies of short reads.</p> <p>Results</p> <p>The method is developed and tested on 22 bacteria for which there exist many EC annotations in Uniprot. Enzymatic signatures are derived for 3 metagenomes, and their functional profiles are explored.</p> <p>We extend the SP methodology to taxon-specific SPs (TSPs), allowing us to estimate taxonomic features of metagenomic data from short reads. Using recent Swiss-Prot data we obtain TSPs for different phyla of bacteria, and different classes of proteobacteria. These allow us to analyze the major taxonomic content of 4 different metagenomic data-sets.</p> <p>Conclusions</p> <p>The SP methodology can be successfully extended to applications on short read genomic and metagenomic data. This leads to direct derivation of enzymatic signatures from raw short reads. Furthermore, by employing TSPs, one obtains valuable taxonomic information.</p

    Proteomics-based metabolic modeling reveals that fatty acid oxidation (FAO) controls endothelial cell (EC) permeability.

    Get PDF
    Endothelial cells (ECs) play a key role to maintain the functionality of blood vessels. Altered EC permeability causes severe impairment in vessel stability and is a hallmark of pathologies such as cancer and thrombosis. Integrating label-free quantitative proteomics data into genome-wide metabolic modeling, we built up a model that predicts the metabolic fluxes in ECs when cultured on a tridimensional matrix and organize into a vascular-like network. We discovered how fatty acid oxidation increases when ECs are assembled into a fully formed network that can be disrupted by inhibiting CPT1A, the fatty acid oxidation rate-limiting enzyme. Acute CPT1A inhibition reduces cellular ATP levels and oxygen consumption, which are restored by replenishing the tricarboxylic acid cycle. Remarkably, global phosphoproteomic changes measured upon acute CPT1A inhibition pinpointed altered calcium signaling. Indeed, CPT1A inhibition increases intracellular calcium oscillations. Finally, inhibiting CPT1A induces hyperpermeability in vitro and leakage of blood vessel in vivo, which were restored blocking calcium influx or replenishing the tricarboxylic acid cycle. Fatty acid oxidation emerges as central regulator of endothelial functions and blood vessel stability and druggable pathway to control pathological vascular permeability

    Systematic Analysis of Compositional Order of Proteins Reveals New Characteristics of Biological Functions and a Universal Correlate of Macroevolution

    Get PDF
    <div><p>We present a novel analysis of compositional order (CO) based on the occurrence of Frequent amino-acid Triplets (FTs) that appear much more than random in protein sequences. The method captures all types of proteomic compositional order including single amino-acid runs, tandem repeats, periodic structure of motifs and otherwise low complexity amino-acid regions. We introduce new order measures, distinguishing between ‘regularity’, ‘periodicity’ and ‘vocabulary’, to quantify these phenomena and to facilitate the identification of evolutionary effects. Detailed analysis of representative species across the tree-of-life demonstrates that CO proteins exhibit numerous functional enrichments, including a wide repertoire of particular patterns of dependencies on regularity and periodicity. Comparison between human and mouse proteomes further reveals the interplay of CO with evolutionary trends, such as faster substitution rate in mouse leading to decrease of periodicity, while innovation along the human lineage leads to larger regularity. Large-scale analysis of 94 proteomes leads to systematic ordering of all major taxonomic groups according to FT-vocabulary size. This is measured by the count of Different Frequent Triplets (DFT) in proteomes. The latter provides a clear hierarchical delineation of vertebrates, invertebrates, plants, fungi and prokaryotes, with thermophiles showing the lowest level of FT-vocabulary. Among eukaryotes, this ordering correlates with phylogenetic proximity. Interestingly, in all kingdoms CO accumulation in the proteome has universal characteristics. We suggest that CO is a genomic-information correlate of both macroevolution and various protein functions. The results indicate a mechanism of genomic ‘innovation’ at the peptide level, involved in protein elongation, shaped in a universal manner by mutational and selective forces.</p></div

    Functional enrichment in <i>A. Thaliana</i> and <i>S. cerevisiae</i>.

    No full text
    <p>Similarly to <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi-1003346-g002" target="_blank">figure 2</a>, functional enrichment in <i>A. Thaliana</i> (A–C) and <i>S. cerevisiae</i> (D–F) are shown with respect to RC (black) or RP (red). Portions of cell wall genes (A, D) and extracellular related genes (C, F) are enriched with increasing the threshold of RC, while portions of response related genes (B, E) are enriched with RP in <i>A. thaliana</i> but RC in yeast.</p

    Frequent Triplets – Theory and simulation.

    No full text
    <p>Expected values of Frequent Triplets (FTs) in random proteins as function of sequence length. Length range is up to 35,000 amino-acids, approximately the length of the longest proteins found among the proteomes of the 94 species studied (TITIN in human, and beta-helical in <i>Chlorobium</i>). A) Blue curve is the theoretical expected value given by the Bernoulli probability, for <i>n = 5</i>. Dark circles are the corresponding results of a numerical search of triplets showing perfect match to the theoretical estimation. Red circles are the numerical results for restrictive FTs defined by <i>n = 5</i> and <i>M = 2000</i>. Inset: same data is shown up to <i>L = 8000</i> for clarity. Additional black curves represent the theoretical estimation for <i>n = 4–6</i>. B) <i>P</i>-value for FT misidentification as function of length on log-scale. C) Length distribution of human proteins showing log-normal characteristics. Length of CO proteins is right-shifted (see also <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s026" target="_blank">Text S1</a> -section 3, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s006" target="_blank">figure S6d</a>). Further analysis based on a human “unigram” reference model is provided in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s026" target="_blank">Text S1</a> - sections 1 and 2, where the few very long proteins are analyzed in detail.</p

    Comparison of CO orthologs in human and mouse.

    No full text
    <p>Comparison of CO orthologs in human and mouse according to their RC values. Each point corresponds to a pair of such proteins (n = 3312). Low homologies are marked by circles. Usually, their CO sections are comparable, however revealing higher harmonics in the mouse (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s026" target="_blank">Text S1</a> - section 7, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi.1003346.s011" target="_blank">figure S11</a>). Off-diagonal pairs always display low homologies. In the upper-left diagonal CO sections of human and mouse resemble each other, having high similarity of FTs and MFI, despite the low RC in human. In the lower-right diagonal mouse CO sections do not resemble human CO sections, except for few exceptions (see text). High homology is obtained for protein pairs with similar MFIs, such as zinc finger (MFI = 28), collagen (MFI = 3) and keratin (MFI = 5) proteins, and lie along the diagonal.</p

    DFT enrichment in eukaryotes.

    No full text
    <p>DFT count and correlation <i>C<sub>IJ</sub></i> of the 39 studied eukaryotes. Species are indexed and ordered as in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi-1003346-t005" target="_blank">table 5</a>, according to the kingdoms Animalia, Plantae, Fungi and within each kingdom, according to their phylogenetic distance. The upper panel shows the heat-map of the correlation <i>C<sub>IJ</sub></i>, the middle panel shows the DFT counts, and the lower panel shows the tree of hierarchical clustering based on Euclidian average distance of <i>C<sub>IJ</sub></i>. Colors of the branches correspond to the taxonomic identity as indicated by the colored abbreviations in the middle panel. Abbreviations are the same as defined in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003346#pcbi-1003346-g005" target="_blank">figure 5</a>. Solid gray branch corresponds to two proximate ends-leafs belonging to different taxonomic groups. Dashed gray branches link groups.</p

    DFT Box-plot by Kingdom.

    No full text
    <p>Box plots of DFT counts across the tree-of-life. Each box delineates lower quartile, median and upper quartile values. Most extreme values (whiskers) are within 1.5 times the inter-quartile range from the ends of the box. Outliers are also displayed. Prokaryotes are displayed twice. First divided according to bacteria and archaea, and secondly as mesophiles and thermophiles. <i>P</i>-values according to non-parametric two-sample Kolmogorov-Smirnov test are 2.5×10<sup>−2</sup> (V-IV), 6.5×10<sup>−3</sup>(IV-P), 9×10<sup>−3</sup> (P-F), 1.7×10<sup>−5</sup> (F-B), 2.3×10<sup>−2</sup>(B-A) and 1.4×10<sup>−4</sup> (M-T). Protista species show large variability and cannot be distinguished from Plantae or Fungi. Abbreviations: Vertebrates (V), Invertebrates (IV), Plantae (P), Fungi (F), Protista (PRT) Bacteria (B) Archaea (A), Mesophiles (M), Thermophiles (T).</p
    corecore