8 research outputs found

    A robust pathfinding algorithm using chemical composition

    Get PDF
    Metabolic pathfinding is the task of finding preferred metabolic pathways from metabolic large reaction databases. Representing metabolism via networks enables quick enumeration of paths between two compounds. Automated pathfinding helps in working with ever increasing databases if reactions and in finding novel pathways for metabolic engineering. However, the number of pathways between two compounds can be as large as 500,000 in some metabolic models and even more as the size of the input database grows, which makes it imperative that the most relevant ones are ranked highly. While graph theoretic representations of metabolic networks bring speed and ease in enumeration of pathways, they also create the challenge of biochemically insensible shortcuts through pool or currency metabolites. In the past, strategies to circumvent such irrelevant pathways have included weighing networks using the degree of nodes or the manual curation of edges in the metabolic network. The former method wrongfully penalizes some primary metabolites central to metabolism, while the latter requires someone to complete manual curation. KEGG RPAIR database is an annotation to describe reactions in terms of reactant pairs and has been used for metabolic pathfinding. Here, I first study a few different centrality measures to identify currency metabolites and identify one better than the degree centrality. I then describe a method to augment the KEGG RPAIR based pathfinding method using a chemical composition score and evaluate its ability to augment and replace the role of RPAIRs in pathfinding. The new algorithm is validated against a set of 30 biochemical pathways in E.coli. Since this method uses chemical composition as a fallback measure, it can be used in the absence of explicit RPAIR information, thus allowing the identification of putative paths not possible via methods using the RPAIR database alone

    Evolution as a design tool to inform biomolecular engineering

    Get PDF
    Enzyme biotechnology is a critical component of technologies needed for increased sustainable materials processing. Along with the ability to rapidly synthesize proteins through fermentation, there is a need to be able to alter enzyme functionality in specific ways to suit the desired application. For instance, industrial enzymes with increased stability at higher temperatures or altered pH optima can improve productivity in large-scale bioreactors through improved catalytic rates and lowered costs due to cooling and reduced contamination. This research aims to provide the scientific community with a suite of design tools and methodologies for protein engineering experiments and research. Specifically, these methods utilize a foundation of concepts from molecular evolution to provide insight into the innovation process in molecular engineering. Methods developed in this study aim to increase thermal stability of an enzyme by engineering disulfide bonds and electrostatic salt bridges on the surface of the enzyme. Two methods to assist disulfide engineering were developed- 1) A neural network model to predict disulfide bonds within existing structures from mutual information and a continuous distributed representation of protein sequence. 2) A methodology incorporating statistics on the structural information of disulfide bonds in conjunction with evolutionary patterns to rank order specific design choices. A similar approach was developed for engineering salt bridges. The methodology for developing geometric constraints and utilizing evolutionary patterns to engineer salt bridges was validated with experiments on 1,4 -glucan branching enzymes by collaborators. The neural network model achieves state of the art accuracy (80%) and in addition, the impact of the protein sequence representation, mutual information and cysteine separation distance on performance of the model were analysed. in a particular disulfide engineering experiment. The trained long short term memory (LSTM) neural network model also serves as a model of disulfide bond sequence motifs so as to develop an understanding of the constraints for disulfide bond formation. The methodology using statistical constraints on disulfide bonds was prototyped as a PyMOL script that identifies potential pairs of residues on the surface of an enzyme for modification to disulfide-capable cysteine residues. This method suggests 85% more stabilising mutations out of 17% fewer suggestions according to evaluations by short Molecular Dynamics simulations using FoldX

    Evolution as a design tool to inform biomolecular engineering

    No full text
    Enzyme biotechnology is a critical component of technologies needed for increased sustainable materials processing. Along with the ability to rapidly synthesize proteins through fermentation, there is a need to be able to alter enzyme functionality in specific ways to suit the desired application. For instance, industrial enzymes with increased stability at higher temperatures or altered pH optima can improve productivity in large-scale bioreactors through improved catalytic rates and lowered costs due to cooling and reduced contamination. This research aims to provide the scientific community with a suite of design tools and methodologies for protein engineering experiments and research. Specifically, these methods utilize a foundation of concepts from molecular evolution to provide insight into the innovation process in molecular engineering. Methods developed in this study aim to increase thermal stability of an enzyme by engineering disulfide bonds and electrostatic salt bridges on the surface of the enzyme. Two methods to assist disulfide engineering were developed- 1) A neural network model to predict disulfide bonds within existing structures from mutual information and a continuous distributed representation of protein sequence. 2) A methodology incorporating statistics on the structural information of disulfide bonds in conjunction with evolutionary patterns to rank order specific design choices. A similar approach was developed for engineering salt bridges. The methodology for developing geometric constraints and utilizing evolutionary patterns to engineer salt bridges was validated with experiments on 1,4 -glucan branching enzymes by collaborators. The neural network model achieves state of the art accuracy (80%) and in addition, the impact of the protein sequence representation, mutual information and cysteine separation distance on performance of the model were analysed. in a particular disulfide engineering experiment. The trained long short term memory (LSTM) neural network model also serves as a model of disulfide bond sequence motifs so as to develop an understanding of the constraints for disulfide bond formation. The methodology using statistical constraints on disulfide bonds was prototyped as a PyMOL script that identifies potential pairs of residues on the surface of an enzyme for modification to disulfide-capable cysteine residues. This method suggests 85% more stabilising mutations out of 17% fewer suggestions according to evaluations by short Molecular Dynamics simulations using FoldX.LimitedAuthor requested closed access (OA after 2yrs) in Vireo ETD syste

    Machine learning analysis of microbial flow cytometry data from nanoparticles, antibiotics and carbon sources perturbed anaerobic microbiomes

    No full text
    Abstract Background Flow cytometry, with its high throughput nature, combined with the ability to measure an increasing number of cell parameters at once can surpass the throughput of prevalent genomic and metagenomic approaches in the study of microbiomes. Novel computational approaches to analyze flow cytometry data will result in greater insights and actionability as compared to traditional tools used in the analysis of microbiomes. This paper is a demonstration of the fruitfulness of machine learning in analyzing microbial flow cytometry data generated in anaerobic microbiome perturbation experiments. Results Autoencoders were found to be powerful in detecting anomalies in flow cytometry data from nanoparticles and carbon sources perturbed anaerobic microbiomes but was marginal in predicting perturbations due to antibiotics. A comparison between different algorithms based on predictive capabilities suggested that gradient boosting (GB) and deep learning, i.e. feed forward artificial neural network with three hidden layers (DL) were marginally better under tested conditions at predicting overall community structure while distributed random forests (DRF) worked better for predicting the most important putative microbial group(s) in the anaerobic digesters viz. methanogens, and it can be optimized with better parameter tuning. Predictive classification patterns with DL (feed forward artificial neural network with three hidden layers) were found to be comparable to previously demonstrated multivariate analysis. The potential applications of this approach have been demonstrated for monitoring the syntrophic resilience of the anaerobic microbiomes perturbed by synthetic nanoparticles as well as antibiotics. Conclusion Machine learning can benefit the microbial flow cytometry research community by providing rapid screening and characterization tools to discover patterns in the dynamic response of microbiomes to several stimuli

    Personalized Ovarian Cancer Disease Surveillance and Detection of Candidate Therapeutic Drug Target in Circulating Tumor DNA

    Get PDF
    Retrospective studies have demonstrated that nearly 50% of patients with ovarian cancer with normal cancer antigen 125 (CA125) levels have persistent disease; however, prospectively distinguishing between patients is currently impossible. Here, we demonstrate that for one patient, with the first reported fibroblast growth factor receptor 2 (FGFR2) fusion transcript in ovarian cancer, circulating tumor DNA (ctDNA) is a more sensitive and specific biomarker than CA125, and it can also inform on a candidate therapeutic. For a 4-year period, during which the patient underwent primary debulking surgery and chemotherapy, tumor recurrences, and multiple chemotherapeutic regimens, blood samples were longitudinally collected and stored. Whereas postsurgical CA125 levels were elevated only three times for 28 measurements, the FGFR2 fusion ctDNA biomarker was readily detectable by quantitative real-time reverse transcription-polymerase chain reaction (PCR) in all of these same blood samples and in the tumor recurrences. Given the persistence of the FGFR2 fusion, we treated tumor cells derived from this patient and others with the FGFR2 inhibitor BGJ398. Only tumor cells derived from this patient were sensitive to FGFR2 inhibitor treatment. Using the same methodologic approach, we demonstrate in a second patient with a different fusion that PCR and agarose gel electrophoresis can also be used to identify tumor-specific DNA in the circulation. Taken together, we demonstrate that a relatively inexpensive, PCR-based ctDNA surveillance assay can outperform CA125 in identifying occult disease
    corecore