56 research outputs found

    Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides

    Get PDF
    Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical information. Specifically, we develop three different physiochemical representations for a set of roughly 700 HIV–I protease substrates. These different representations are used as input to an array of six different machine learning models which are used to predict whether or not a given peptide is likely to be an acceptable substrate for the protease. Our results show that, in general, higher–dimensional physiochemical representations tend to have better performance than representations incorporating fewer dimensions selected on the basis of high information content. We contend that such representations are more biologically relevant than simple string–based representations and are likely to more accurately capture peptide characteristics that are functionally important.Singapore-MIT Alliance (SMA

    Applications of motif discovery in biological data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2007.Includes bibliographical references (p. 437-458).Sequential motif discovery, the ability to identify conserved patterns in ordered datasets without a priori knowledge of exactly what those patterns will be, is a frequently encountered and difficult problem in computational biology and biochemical engineering. The most prevalent example of such a problem is finding conserved DNA sequences in the upstream regions of genes that are believed to be coregulated. Other examples are as diverse as identifying conserved secondary structure in proteins and interpreting time-series data. This thesis creates a unified, generic approach to addressing these (and other) problems in sequential motif discovery and demonstrates the utility of that approach on a number of applications. A generic motif discovery algorithm was created for the purpose of finding conserved patterns in arbitrary data types. This approach and implementation, name Gemoda, decouples three key steps in the motif discovery process: comparison, clustering, and convolution. Since it decouples these steps, Gemoda is a modular algorithm; that is, any comparison metric can be used with any clustering algorithm and any convolution scheme. The comparison metric is a data-specific function that transforms the motif discovery problem into a solvable graph-theoretic problem that still adequately represents the important similarities in the data.(cont.) This thesis presents the development of Gemoda as well as applications of this approach in a number of different contexts. One application is an exhaustive solution of an abstraction of the transcription factor binding site discovery problem in DNA. A similar application is to the analysis of upstream regions of regulons in microbial DNA. Another application is the identification of protein sequence homologies in a set of related proteins in the presence of significant noise. A quite different application is the discovery of extended local secondary structure homology between a protein and a protein complex known to be in the same structural family. The final application is to the analysis of metabolomic datasets. The diversity of these sample applications, which range from the analysis of strings (like DNA and amino acid sequences) to real-valued data (like protein structures and metabolomic datasets) demonstrates that our generic approach is successful and useful for solving established and novel problems alike. The last application, of analyzing metabolomic datasets, is of particular interest. Using Gemoda, an appropriate comparison function, and appropriate data handling, a novel and useful approach to the interpretation of metabolite profiling datasets obtained from gas chromatography coupled to mass spectrometry is developed.(cont.) The use of a motif discovery approach allows for the expansion of the scope of metabolites that can be tracked and analyzed in an untargeted metabolite profiling (or metabolomic) experiment. This new approach, named SpectConnect, is presented herein along with examples that verify its efficacy and utility in some validation experiments. The beginning of a broader application of SpectConnect's potential is presented as well. The success of SpectConnect, a novel application of Gemoda, validates the utility of a truly generic approach to motif discovery. By not getting bogged down in the specifics of a type of data and a problem unique to that type of data, a broader class of problems can be addressed that otherwise would have been extremely difficult to handle.by Mark Philip-Walter Styczynski.Ph.D

    Applications of metabolomics in cancer research

    Get PDF
    The first discovery of metabolic changes in cancer occurred almost a century ago. While the genetic underpinnings of cancer have dominated its study since then, altered metabolism has recently been acknowledged as a key hallmark of cancer and metabolism-focused research has received renewed attention. The emerging field of metabolomics - which attempts to profile all metabolites within a cell or biological system - is now being used to analyze cancer metabolism on a system-wide scale, painting a broad picture of the altered pathways and their interactions with each other. While a large fraction of cancer metabolomics research is focused on finding diagnostic biomarkers, metabolomics is also being used to obtain more fundamental mechanistic insight into cancer and carcinogenesis. Applications of metabolomics are also emerging in areas such as tumor staging and assessment of treatment efficacy. This review summarizes contributions that metabolomics has made in cancer research and presents the current challenges and potential future directions within the field

    The Effect of the PPPLF on PPP Lending by Commercial Banks

    Get PDF

    The Metabolomics Society-Current State of the Membership and Future Directions.

    Get PDF
    Background: In 2017, the Metabolomics Society conducted a survey among its members to assess the degree of its current success, define opportunities for improving its service to the community and make plans to establish future goals and direction of the Society. Methods: A 32-question online survey was sent via e-mail to all Metabolomics Society members as of 19 June 2017 (n = 644). In addition to the direct e-mails, the link to access the survey was made available through social media. The survey was open until 10 August 2017. Question-specific data were reported using the summary data generated by SurveyMonkey and additional stratified analyses performed using Stata 15. Results: The number of respondents was 394 (61%) with 348 (88%) completing the multiple-choice questions in survey. Metabolomics Society annual meetings, networking and the opportunity to join the global metabolomics community were among the most important benefits expressed by the Metabolomics Society members. Conclusions: The survey collected the first data focusing on membership issues from Society members. The Society should focus on collecting and monitoring of demographic data during the membership registration process; continuing to support the early-career members of the Society; and developing initiatives that focus on member networking to retain and increase Society membership

    Twin nucleation and variant selection in Mg alloys: An integrated crystal plasticity modelling and experimental approach

    Get PDF
    Extension twin nucleation and variant selection in magnesium alloy WE43 is investigated in experimentally characterised and deformed microstructures replicated in crystal plasticity models. Total stored (dislocation) energy density is found to identify the experimentally observed locations of twins which are not otherwise explained by global Schmid factors or local resolved shear stress criteria. A critical total stored energy of the order 0.015 Jm-2 is determined below which twin nucleation does not occur. The total stored energy density explains the locations of the observed twins and the absence of twins in parent grains anticipated to be favourable for twin nucleation. Twin variant selection has been shown to be driven by minimising locally stored shear energy density, while the geometric compatibility and strain compatibility factors only aid in partial prediction. All experimentally observed variants were correctly determined

    Uncovering Metabolic Regulation and Dynamics

    No full text
    Presented on July 10, 2012 from 8:30 a.m.-9:30 a.m. at the Parker H. Petit Institute for Bioengineering & Bioscience (IBB), room 1128, Georgia Tech.Runtime: 51:22 minutesUnderstanding and controlling cellular metabolism (the process by which nutrients taken into a cell are turned into energy and the building blocks for more cells) is crucial to numerous applications, from enabling more efficient bioenergy production to unraveling the mechanisms of diseases like cancer. However, true understanding of (and control over) metabolism is hindered by a dearth of information available about the dynamics of metabolism and the molecular mechanisms that regulate those dynamics. A deeper understanding in these areas would enable much more efficient manipulation of existing metabolic networks to circumvent or exploit native metabolic regulation. In this seminar, we will discuss our work as we begin to unravel metabolic dynamics and regulation in two different (yet related) systems: yeast and cancer. Using mass spectrometry, we investigate the metabolic dynamics of cancer cells in response to environmental perturbations that we expect tumors to encounter in vivo. We also use complementary high-throughput analytical techniques to begin to enumerate the space of metabolite-protein interactions in the metabolic network of the yeast Saccharomyces cerevisiae

    Systematic Applications of Metabolomics in Metabolic Engineering

    No full text
    The goals of metabolic engineering are well-served by the biological information provided by metabolomics: information on how the cell is currently using its biochemical resources is perhaps one of the best ways to inform strategies to engineer a cell to produce a target compound. Using the analysis of extracellular or intracellular levels of the target compound (or a few closely related molecules) to drive metabolic engineering is quite common. However, there is surprisingly little systematic use of metabolomics datasets, which simultaneously measure hundreds of metabolites rather than just a few, for that same purpose. Here, we review the most common systematic approaches to integrating metabolite data with metabolic engineering, with emphasis on existing efforts to use whole-metabolome datasets. We then review some of the most common approaches for computational modeling of cell-wide metabolism, including constraint-based models, and discuss current computational approaches that explicitly use metabolomics data. We conclude with discussion of the broader potential of computational approaches that systematically use metabolomics data to drive metabolic engineering
    • …
    corecore