Search CORE

1,786 research outputs found

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

Probabilistic structural mechanics research for parallel processing computers

Author: Chen Heh-Chyun
Martin William R.
Sues Robert H.
Twisdale Lawrence A.
Publication venue
Publication date
Field of study

Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical

NASA Technical Reports Server

Lanczos eigensolution method for high-performance computers

Author: Bostic Susan W.
Publication venue
Publication date
Field of study

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

NASA Technical Reports Server

FROM THERMAL SPRINGS TO SUBWAY BENCHES: EXPLORING THE DIVERSITY OF CARBON MONOXIDE DEHYDROGENASES THROUGH METAGENOMES, PHYLOGENETICS, AND MACHINE LEARNING

Author: Bigcraft Isaac
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2022
Field of study

Carbon monoxide is well known as a toxic gas but can also be an important input and intermediary for microbial metabolisms. Carbon monoxide dehydrogenases (CODHs) serve as key enzyme complexes for a variety of microbial carbon monoxide (CO) utilization pathways. Such pathways include the Wood-Ljungdahl pathway, which is important in methanogenesis and acetogenesis, metal and sulfate reduction pathways, hydrogen production, and others. The CODH enzymes allow microbes to turn the traditionally toxic waste gas of CO into a useful carbon and energy source. Despite the flexibility of CODH enzymes, the use of carbon monoxide is still believed to be a fringe metabolism. Here we seek to expand the known diversity, distribution, and phylogeny of CODH catalytic subunit proteins by searching an expansive dataset of over 50,000 metagenome assembled genomes. Our work has shown that this dataset contains 5,426 putative CODH protein sequences found within 4,001 metagenome assembled genomes. Despite the considerable expansion of the known set of CODH sequences, our phylogenetic analysis has validated the protein\u27s previously established phylogeny while showing a wider environmental and taxonomic distribution of CODHs. Often considered to be found primarily in areas with high levels of CO, CODHs are typically associated with thermal and extremophiles. In addition to the expected high temperature environments, CODHs were found in metagenomes from diverse environments from soils to subway benches, and in phyla ranging from archaeal Euryarchaeota to bacterial Actinobacterota. We also have constructed a machine learning model to extract functional predictions and information using a sequence-only method to predict gene ontologies (GO-terms) associated with CODH function. While our model can achieve accurate prediction of GO-terms, our work has shown some of the current limitations in the approach. This study reveals CODHs to be a more diverse and ubiquitous enzyme than previously anticipated. Despite tripling the number of sequences in the phylogeny, we provide strong support for the previously established clades and report no new clades. This work has also identified some key areas for experimental follow up regarding the importance of carbon monoxide and CODH genes in many environments

Michigan Technological University

Software Requirements Classification Using Word Embeddings and Convolutional Neural Networks

Author: Fong Vivian Lin
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2018
Field of study

Software requirements classification, the practice of categorizing requirements by their type or purpose, can improve organization and transparency in the requirements engineering process and thus promote requirement fulfillment and software project completion. Requirements classification automation is a prominent area of research as automation can alleviate the tediousness of manual labeling and loosen its necessity for domain-expertise. This thesis explores the application of deep learning techniques on software requirements classification, specifically the use of word embeddings for document representation when training a convolutional neural network (CNN). As past research endeavors mainly utilize information retrieval and traditional machine learning techniques, we entertain the potential of deep learning on this particular task. With the support of learning libraries such as TensorFlow and Scikit-Learn and word embedding models such as word2vec and fastText, we build a Python system that trains and validates configurations of Naïve Bayes and CNN requirements classifiers. Applying our system to a suite of experiments on two well-studied requirements datasets, we recreate or establish the Naïve Bayes baselines and evaluate the impact of CNNs equipped with word embeddings trained from scratch versus word embeddings pre-trained on Big Data

DigitalCommons@CalPoly

Sampling of stochastic operators

Author: Pfander Götz E.
Zheltov Pavel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2014
Field of study

We develop sampling methodology aimed at determining stochastic operators that satisfy a support size restriction on the autocorrelation of the operators stochastic spreading function. The data that we use to reconstruct the operator (or, in some cases only the autocorrelation of the spreading function) is based on the response of the unknown operator to a known, deterministic test signal

arXiv.org e-Print Archive

Publikationsserver der Katholischen Universität Eichstätt-Ingolstadt

Guidance for benthic habitat mapping: an aerial photographic approach

Author: Finkbeiner Mark
Seaman Renee
Stevenson Bill
Publication venue: NOAA/National Ocean Service/Coastal Services Center
Publication date: 01/01/2001
Field of study

This document, Guidance for Benthic Habitat Mapping: An Aerial Photographic Approach, describes proven technology that can be applied in an operational manner by state-level scientists and resource managers. This information is based on the experience gained by NOAA Coastal Services Center staff and state-level cooperators in the production of a series of benthic habitat data sets in Delaware, Florida, Maine, Massachusetts, New York, Rhode Island, the Virgin Islands, and Washington, as well as during Center-sponsored workshops on coral remote sensing and seagrass and aquatic habitat assessment. (PDF contains 39 pages) The original benthic habitat document, NOAA Coastal Change Analysis Program (C-CAP): Guidance for Regional Implementation (Dobson et al.), was published by the Department of Commerce in 1995. That document summarized procedures that were to be used by scientists throughout the United States to develop consistent and reliable coastal land cover and benthic habitat information. Advances in technology and new methodologies for generating these data created the need for this updated report, which builds upon the foundation of its predecessor

Aquatic Commons

Applied Geoinformatics in Forestry and Landscape Research and Education

Author: Martin Klimanek
Vladimir Zidek
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Crossref