614,594 research outputs found
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Synthetic biology—putting engineering into biology
Synthetic biology is interpreted as the engineering-driven building of increasingly complex biological entities for novel applications. Encouraged by progress in the design of artificial gene networks, de novo DNA synthesis and protein engineering, we review the case for this emerging discipline. Key aspects of an engineering approach are purpose-orientation, deep insight into the underlying scientific principles, a hierarchy of abstraction including suitable interfaces between and within the levels of the hierarchy, standardization and the separation of design and fabrication. Synthetic biology investigates possibilities to implement these requirements into the process of engineering biological systems. This is illustrated on the DNA level by the implementation of engineering-inspired artificial operations such as toggle switching, oscillating or production of spatial patterns. On the protein level, the functionally self-contained domain structure of a number of proteins suggests possibilities for essentially Lego-like recombination which can be exploited for reprogramming DNA binding domain specificities or signaling pathways. Alternatively, computational design emerges to rationally reprogram enzyme function. Finally, the increasing facility of de novo DNA synthesis—synthetic biology’s system fabrication process—supplies the possibility to implement novel designs for ever more complex systems. Some of these elements have merged to realize the first tangible synthetic biology applications in the area of manufacturing of pharmaceutical compounds.
Engineering novel complement activity into a pulmonary surfactant protein
Complement neutralizes invading pathogens, stimulates inflammatory and adaptive immune responses, and targets non- or altered-self structures for clearance. In the classical and lectin activation pathways, it is initiated when complexes composed of separate recognition and activation subcomponents bind to a pathogen surface. Despite its apparent complexity, recognition-mediated activation has evolved independently in three separate protein families, C1q, mannose-binding lectins (MBLs), and serum ficolins. Although unrelated, all have bouquet-like architectures and associate with complement-specific serine proteases: MBLs and ficolins with MBL-associated serine protease-2 (MASP-2) and C1q with C1r and C1s. To examine the structural requirements for complement activation, we have created a number of novel recombinant rat MBLs in which the position and orientation of the MASP-binding sites have been changed. We have also engineered MASP binding into a pulmonary surfactant protein (SP-A), which has the same domain structure and architecture as MBL but lacks any intrinsic complement activity. The data reveal that complement activity is remarkably tolerant to changes in the size and orientation of the collagenous stalks of MBL, implying considerable rotational and conformational flexibility in unbound MBL. Furthermore, novel complement activity is introduced concurrently with MASP binding in SP-A but is uncontrolled and occurs even in the absence of a carbohydrate target. Thus, the active rather than the zymogen state is default in lectin·MASP complexes and must be inhibited through additional regions in circulating MBLs until triggered by pathogen recognition
An extra dimension in protein tagging by quantifying universal proteotypic peptides using targeted proteomics
The use of protein tagging to facilitate detailed characterization of target proteins has not only revolutionized cell biology, but also enabled biochemical analysis through efficient recovery of the protein complexes wherein the tagged proteins reside. The endogenous use of these tags for detailed protein characterization is widespread in lower organisms that allow for efficient homologous recombination. With the recent advances in genome engineering, tagging of endogenous proteins is now within reach for most experimental systems, including mammalian cell lines cultures. In this work, we describe the selection of peptides with ideal mass spectrometry characteristics for use in quantification of tagged proteins using targeted proteomics. We mined the proteome of the hyperthermophile Pyrococcus furiosus to obtain two peptides that are unique in the proteomes of all known model organisms (proteotypic) and allow sensitive quantification of target proteins in a complex background. By combining these 'Proteotypic peptides for Quantification by SRM' (PQS peptides) with epitope tags, we demonstrate their use in co-immunoprecipitation experiments upon transfection of protein pairs, or after introduction of these tags in the endogenous proteins through genome engineering. Endogenous protein tagging for absolute quantification provides a powerful extra dimension to protein analysis, allowing the detailed characterization of endogenous proteins
Algorithm engineering for optimal alignment of protein structure distance matrices
Protein structural alignment is an important problem in computational
biology. In this paper, we present first successes on provably optimal pairwise
alignment of protein inter-residue distance matrices, using the popular Dali
scoring function. We introduce the structural alignment problem formally, which
enables us to express a variety of scoring functions used in previous work as
special cases in a unified framework. Further, we propose the first
mathematical model for computing optimal structural alignments based on dense
inter-residue distance matrices. We therefore reformulate the problem as a
special graph problem and give a tight integer linear programming model. We
then present algorithm engineering techniques to handle the huge integer linear
programs of real-life distance matrix alignment problems. Applying these
techniques, we can compute provably optimal Dali alignments for the very first
time
Protein-based materials, toward a new level of structural control
Through billions of years of evolution nature has created and refined structural proteins for a wide variety of specific purposes. Amino acid sequences and their associated folding patterns combine to create elastic, rigid or tough materials. In many respects, nature’s intricately designed products provide challenging examples for materials scientists, but translation of natural structural concepts into bio-inspired materials requires a level of control of macromolecular architecture far higher than that afforded by conventional polymerization processes. An increasingly important approach to this problem has been to use biological systems for production of materials. Through protein engineering, artificial genes can be developed that encode protein-based materials with desired features. Structural elements found in nature, such as β-sheets and α-helices, can be combined with great flexibility, and can be outfitted with functional elements such as cell binding sites or enzymatic domains. The possibility of incorporating non-natural amino acids increases the versatility of protein engineering still further. It is expected that such methods will have large impact in the field of materials science, and especially in biomedical materials science, in the future
Improved split fluorescent proteins for endogenous protein labeling.
Self-complementing split fluorescent proteins (FPs) have been widely used for protein labeling, visualization of subcellular protein localization, and detection of cell-cell contact. To expand this toolset, we have developed a screening strategy for the direct engineering of self-complementing split FPs. Via this strategy, we have generated a yellow-green split-mNeonGreen21-10/11 that improves the ratio of complemented signal to the background of FP1-10-expressing cells compared to the commonly used split GFP1-10/11; as well as a 10-fold brighter red-colored split-sfCherry21-10/11. Based on split sfCherry2, we have engineered a photoactivatable variant that enables single-molecule localization-based super-resolution microscopy. We have demonstrated dual-color endogenous protein tagging with sfCherry211 and GFP11, revealing that endoplasmic reticulum translocon complex Sec61B has reduced abundance in certain peripheral tubules. These new split FPs not only offer multiple colors for imaging interaction networks of endogenous proteins, but also hold the potential to provide orthogonal handles for biochemical isolation of native protein complexes.Split fluorescent proteins (FPs) have been widely used to visualise proteins in cells. Here the authors develop a screen for engineering new split FPs, and report a yellow-green split-mNeonGreen2 with reduced background, a red split-sfCherry2 for multicolour labeling, and its photoactivatable variant for super-resolution use
- …
