    An atlas of genetic scores to predict multi-omic traits

    The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores

    The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics. Here we examine a large cohort (the INTERVAL study; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores

    Functional classification of genes identified in members of the <i>Methylocystaceae</i>.

    No full text
    <p>The gene content of strain SC2 (red), strain Rockwell (blue) and <i>Ms. trichosporium</i> OB3b (green) and that of the core genome shared by them (grey) was subjected to functional classification by the RAST server. CDS were classified into 27 functional categories using the SEED subsystem. Numbers in parentheses next to the strain names indicate the number of CDS assigned to the SEED subsystem out of the total number of CDS present in the particular genome. The proportion of CDS (x-axis) assigned to a particular subsystem was calculated by dividing the number of CDS assigned to this category by the total number of CDS assigned to the SEED subsystem database. The functional categories were arranged according to the number of CDS assigned for strain SC2 to each category. The number of CDS classified for the individual strains and their core genome into each SEED subsystem was subjected to statistical analysis using STAMP. A <i>p</i>-value cutoff of 0.05 was used to determine significant differences. Subsystems showing significant differences are marked by an asterisk.</p

    Conjugative Type 4 Secretion System of a Novel Large Plasmid from the Chemoautotroph Tetrathiobacter kashmirensis and Construction of Shuttle Vectors for Alcaligenaceae▿ †

    No full text
    Tetrathiobacter spp. and other members of the Alcaligenaceae are metabolically versatile and environmentally significant. A novel, ∼60-kb conjugative plasmid, pBTK445, from the sulfur chemolithoautotroph Tetrathiobacter kashmirensis, was identified and characterized. This plasmid exists at a low copy number of 2 to 3 per host chromosome. The portion of pBTK445 sequenced so far (∼25 kb) harbors genes putatively involved in replication, transfer functions, partition, and UV damage repair. A 1,373-bp region was identified as the minimal replicon. This region contains a repA gene encoding a protein belonging to the RPA (replication protein A) superfamily and an upstream, iteron-based oriV. A contiguous 11-gene cluster homologous to various type 4 secretion systems (T4SSs) was identified. Insertional inactivation demonstrated that this cluster is involved in the conjugative transfer functions of pBTK445, and thus, it was named the tagB (transfer-associated gene homologous to virB) locus. The core and peripheral TagB components show different phylogenetic affinities, suggesting that this system has evolved by assembling components from evolutionarily divergent T4SSs. A virD4 homolog, putatively involved in nucleoprotein transfer, is also present downstream of the tagB locus. Although pBTK445 resembles IncP plasmids in terms of its genomic organization and the presence of an IncP-specific trbM homolog, it also shows several unique features. Unlike that of IncP, the oriT of pBTK445 is located in close proximity to the oriV, and a traL homolog, which is generally present in the TraI locus of IncP, is present in pBTK445 in isolation, upstream of the tagB locus. A significant outcome of this study is the construction of conjugative shuttle vectors for Tetrathiobacter and related members of the Alkaligenaceae

    Prediction of the <i>oriC</i> region by Ori-Finder.

    No full text
    <p>(A) 1,063-bp sequence (2,008,193 bp to 2,009,255 bp) of the predicted <i>oriC</i> site. Three <i>dnaA</i> box motifs identified using the <i>Escherichia coli</i> specific <i>dnaA</i> boxes are bold-faced and highlighted. Palindromic repeats identified in this region are marked by arrows at the top. (B, C) The Z-curves measuring the disparity between the percent content of AT (red lines), GC (green lines), RY (blue lines) and MK (yellow lines) for the original sequence (B) and the rotated sequence (C). It should be noted that the coordinate origin of the rotated sequence begins and ends in the maximum of the GC disparity curve. Short vertical red lines at the top show the locations of indicator genes, such as <i>dnaA</i>, <i>dnaN</i>, <i>gidA</i>, and <i>hemE</i>. The upward black arrow indicates the position of the predicted <i>oriC</i>. Purple peaks with diamonds indicate DnaA box clusters. (D) Pairwise alignment between the <i>dif</i> sites located in the genomes of <i>E. coli</i> and strain SC2. In strain SC2, the <i>dif</i>-like sequence is located from nucleotide position 276,895 to 276,922 (almost halfway of the deduced <i>oriC</i>) and matches at 20 nucleotide positions with the 28-bp <i>dif</i> sequence of <i>E. coli</i>.</p

    Venn diagram showing the number of CDS unique to and shared by the <i>Methylocystaceae</i> members.

    No full text
    <p>Data analysis was performed using the genomes of strain SC2 (red), strain Rockwell (blue) and <i>Ms. trichosporium</i> OB3b (green). Numbers in circles indicate the number of unique CDS, while those in intersections represent the number of orthologous CDS common to two or more strains. Orthologs were detected by reciprocal best BLASTP matches with the EDGAR software.</p

    N<sub>2</sub> fixation by strain SC2.

    No full text
    <p>(A) Growth dynamics (OD<sub>600</sub>) of strain SC2 in batch cultures on N-free medium (with atmospheric N<sub>2</sub> as sole nitrogen source). Oxygen concentrations of 1% (blue), 5% (green), 10% (red), 15% (brown) and 20% (black) were used to test their effect on N<sub>2</sub> fixation-mediated growth. Note that the x-axis is not in scale. (B) Effect of oxygen on the nitrogenase activity (acetylene reduction assay) in strain SC2. Ethylene production was measured after 24 hours of incubation under different concentrations of oxygen in the headspace. Data points are means ±SD of three separate experiments.</p

    Denitrification-mediated N<sub>2</sub> production by strain SC2.

    No full text
    <p><sup>30</sup>N<sub>2</sub> production was measured after fifteen days for cells incubated in NMS containing either K<sup>15</sup>NO<sub>3</sub> (blue) or KNO<sub>3</sub> (orange). The assays were performed under both anaerobic and aerobic conditions. Data points are means ±SD of three separate experiments.</p

    Neighbor-joining tree constructed for the methanotrophic core genome.

    No full text
    <p>The tree is based on the alignment of 154 CDS that are common to all eight methanotroph genomes used for comparative analysis. Non-matching parts of the alignments were eliminated prior to tree construction. The individual gene alignments were combined into one concatenated alignment. The neighbor-joining tree was constructed using EDGAR. All branches of the phylogenetic tree showed 100% bootstrap support based on 500 replications. See ‘<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0074767#s3" target="_blank">Materials and Methods</a>’ for further details.</p

    N<sub>2</sub>O production by strain SC2.

    No full text
    <p>Cells were incubated in NMS, either in the presence (filled symbol) or absence (open symbol) of 10% acetylene. Assays were performed both under anaerobic (orange) and aerobic (green) conditions. Data points are means ±SD of three separate experiments. The inset shows the same graph with a y-axis zoomed in for the range 0 to 0.8.</p