14,697 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Modeling Big Medical Survival Data Using Decision Tree Analysis with Apache Spark
In many medical studies, an outcome of interest is not only whether an event occurred, but when an event occurred; and an example of this is Alzheimer’s disease (AD). Identifying patients with Mild Cognitive Impairment (MCI) who are likely to develop Alzheimer’s disease (AD) is highly important for AD treatment. Previous studies suggest that not all MCI patients will convert to AD. Massive amounts of data from longitudinal and extensive studies on thousands of Alzheimer’s patients have been generated. Building a computational model that can predict conversion form MCI to AD can be highly beneficial for early intervention and treatment planning for AD. This work presents a big data model that contains machine-learning techniques to determine the level of AD in a participant and predict the time of conversion to AD. The proposed framework considers one of the widely used screening assessment for detecting cognitive impairment called Montreal Cognitive Assessment (MoCA). MoCA data set was collected from different centers and integrated into our large data framework storage using a Hadoop Data File System (HDFS); the data was then analyzed using an Apache Spark framework. The accuracy of the proposed framework was compared with a semi-parametric Cox survival analysis model
- …