11 research outputs found

    Building a global alliance of biofoundries (vol 10, 2040, 2019)

    Get PDF
    The original version of this Comment contained errors in the legend of Figure 2, in which the locations of the fifteenth and sixteenth GBA members were incorrectly given as '(15) Australian Genome Foundry, Macquarie University; (16) Australian Foundry for Advanced Biomanufacturing, University of Queensland.'. The correct version replaces this with '(15) Australian Foundry for Advanced Biomanufacturing (AusFAB), University of Queensland and (16) Australian Genome Foundry, Macquarie University'. This has been corrected in both the PDF and HTML versions of the Comment

    Reactome modified for tracing ArangoDB version

    No full text
    <h3>Reactome database download and customization</h3><p>The Reactome database [1,2] was downloaded as a neo4j graph database (<a href="https://reactome.org/download-data">https://reactome.org/download-data</a> version 75), which is covered by the <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International (CC BY 4.0)</a> license. A series of database queries was used to generate a database version suitable for graph data science which can be followed in detail in the attached Jupyter notebook (Or at <a href="https://github.com/SBRG/GDS-Public/blob/main/notebooks/reactome/Reactome%20GDS.ipynb">GDS-Public/notebooks/reactome/Reactome GDS.ipynb at main · SBRG/GDS-Public (github.com)</a>). </p><p>Nodes, labels and relationships not required for graph algorithmic analyses were removed. For instance, this included nodes like person, affiliation, and taxa as well as all nodes representing entities of organisms other than <i>Homo sapiens</i>. Subcellular locations (compartments) of biological entities were set as node properties. To allow for improved graph traversal, selected relationships were reversed or added. Because currency metabolites, e.g. ATP, NAD(P)H and H+, can artificially connect metabolic reactions and pathways in network analyses [3,4], we labelled such compounds plus the regulatory protein ubiquitin accordingly and thereby excluded them from all analyses. Finally, the database was transformed into an ArangoDB graph database consisting of 1,703,054 nodes and 3,368,926 edges. </p><h3>References</h3><p>1.  Gillespie, M. <i>et al.</i> The reactome pathway knowledgebase 2022. <i>Nucleic Acids Research</i> <strong>50</strong>, D687–D692 (2022).</p><p>2.  Fabregat, A. <i>et al.</i> Reactome graph database: Efficient access to complex pathway data. <i>PLoS Computational Biology</i> <strong>14</strong>, (2018).</p><p>3.  Ma, H. & Zeng, A.-P. <i>Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms</i>. <i>BIOINFORMATICS</i> vol. 19 https://academic.oup.com/bioinformatics/article/19/2/270/372721 (2003).</p><p>4.  Martínez, V. S. <i>et al.</i> The topology of genome-scale metabolic reconstructions unravels independent modules and high network flexibility. <i>PLoS Computational Biology</i> <strong>18</strong>, (2022).</p><p> </p&gt

    GASP: A pan-specific predictor of family 1 glycosyltransferase specificity enabled by a pipeline for substrate feature generation and large-scale experimental screening

    No full text
    Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the Glycosyltransferase Acceptor Specificity Predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. GASP is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). GASP had an 83% hit rate in a comparative case study for the glycosylation of the anti-helminth drug niclosamide, significantly outperforming a hit rate of 53% from a random selection assay. However, it was unable to compete with a hit rate of 83% for the glycosylation of the plant defensive compound DIBOA using expert-selected enzymes, with GASP achieving a hit rate of 50%. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a ready-to-use GT1:acceptor predictor which in addition can be trained on other datasets enabled by the automated feature generation pipelines
    corecore