1,786 research outputs found

    Machine learning-guided directed evolution for protein engineering

    Get PDF
    Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

    Probabilistic structural mechanics research for parallel processing computers

    Get PDF
    Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical

    Lanczos eigensolution method for high-performance computers

    Get PDF
    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

    FROM THERMAL SPRINGS TO SUBWAY BENCHES: EXPLORING THE DIVERSITY OF CARBON MONOXIDE DEHYDROGENASES THROUGH METAGENOMES, PHYLOGENETICS, AND MACHINE LEARNING

    Get PDF
    Carbon monoxide is well known as a toxic gas but can also be an important input and intermediary for microbial metabolisms. Carbon monoxide dehydrogenases (CODHs) serve as key enzyme complexes for a variety of microbial carbon monoxide (CO) utilization pathways. Such pathways include the Wood-Ljungdahl pathway, which is important in methanogenesis and acetogenesis, metal and sulfate reduction pathways, hydrogen production, and others. The CODH enzymes allow microbes to turn the traditionally toxic waste gas of CO into a useful carbon and energy source. Despite the flexibility of CODH enzymes, the use of carbon monoxide is still believed to be a fringe metabolism. Here we seek to expand the known diversity, distribution, and phylogeny of CODH catalytic subunit proteins by searching an expansive dataset of over 50,000 metagenome assembled genomes. Our work has shown that this dataset contains 5,426 putative CODH protein sequences found within 4,001 metagenome assembled genomes. Despite the considerable expansion of the known set of CODH sequences, our phylogenetic analysis has validated the protein\u27s previously established phylogeny while showing a wider environmental and taxonomic distribution of CODHs. Often considered to be found primarily in areas with high levels of CO, CODHs are typically associated with thermal and extremophiles. In addition to the expected high temperature environments, CODHs were found in metagenomes from diverse environments from soils to subway benches, and in phyla ranging from archaeal Euryarchaeota to bacterial Actinobacterota. We also have constructed a machine learning model to extract functional predictions and information using a sequence-only method to predict gene ontologies (GO-terms) associated with CODH function. While our model can achieve accurate prediction of GO-terms, our work has shown some of the current limitations in the approach. This study reveals CODHs to be a more diverse and ubiquitous enzyme than previously anticipated. Despite tripling the number of sequences in the phylogeny, we provide strong support for the previously established clades and report no new clades. This work has also identified some key areas for experimental follow up regarding the importance of carbon monoxide and CODH genes in many environments

    Software Requirements Classification Using Word Embeddings and Convolutional Neural Networks

    Get PDF
    Software requirements classification, the practice of categorizing requirements by their type or purpose, can improve organization and transparency in the requirements engineering process and thus promote requirement fulfillment and software project completion. Requirements classification automation is a prominent area of research as automation can alleviate the tediousness of manual labeling and loosen its necessity for domain-expertise. This thesis explores the application of deep learning techniques on software requirements classification, specifically the use of word embeddings for document representation when training a convolutional neural network (CNN). As past research endeavors mainly utilize information retrieval and traditional machine learning techniques, we entertain the potential of deep learning on this particular task. With the support of learning libraries such as TensorFlow and Scikit-Learn and word embedding models such as word2vec and fastText, we build a Python system that trains and validates configurations of Naïve Bayes and CNN requirements classifiers. Applying our system to a suite of experiments on two well-studied requirements datasets, we recreate or establish the Naïve Bayes baselines and evaluate the impact of CNNs equipped with word embeddings trained from scratch versus word embeddings pre-trained on Big Data

    Sampling of stochastic operators

    Full text link
    We develop sampling methodology aimed at determining stochastic operators that satisfy a support size restriction on the autocorrelation of the operators stochastic spreading function. The data that we use to reconstruct the operator (or, in some cases only the autocorrelation of the spreading function) is based on the response of the unknown operator to a known, deterministic test signal

    Guidance for benthic habitat mapping: an aerial photographic approach

    Get PDF
    This document, Guidance for Benthic Habitat Mapping: An Aerial Photographic Approach, describes proven technology that can be applied in an operational manner by state-level scientists and resource managers. This information is based on the experience gained by NOAA Coastal Services Center staff and state-level cooperators in the production of a series of benthic habitat data sets in Delaware, Florida, Maine, Massachusetts, New York, Rhode Island, the Virgin Islands, and Washington, as well as during Center-sponsored workshops on coral remote sensing and seagrass and aquatic habitat assessment. (PDF contains 39 pages) The original benthic habitat document, NOAA Coastal Change Analysis Program (C-CAP): Guidance for Regional Implementation (Dobson et al.), was published by the Department of Commerce in 1995. That document summarized procedures that were to be used by scientists throughout the United States to develop consistent and reliable coastal land cover and benthic habitat information. Advances in technology and new methodologies for generating these data created the need for this updated report, which builds upon the foundation of its predecessor
    corecore