32 research outputs found
Machine Learning in Enzyme Engineering
Enzyme engineering plays a central role in developing efficient biocatalysts for biotechnology, biomedicine, and life sciences. Apart from classical rational design and directed evolution approaches, machine learning methods have been increasingly applied to find patterns in data that help predict protein structures, improve enzyme stability, solubility, and function, predict substrate specificity, and guide rational protein design. In this Perspective, we analyze the state of the art in databases and methods used for training and validating predictors in enzyme engineering. We discuss current limitations and challenges which the community is facing and recent advancements in experimental and theoretical methods that have the potential to address those challenges. We also present our view on possible future directions for developing the applications to the design of efficient biocatalysts
Predicting protein stability and solubility changes upon mutations: data perspective
Understanding mutational effects on protein stability and solubility is of particular importance for creating industrially relevant biocatalysts, resolving mechanisms of many human diseases, and producing efficient biopharmaceuticals, to name a few. Forin silicopredictions, the complexity of the underlying processes and increasing computational capabilities favor the use of machine learning. However, this approach requires sufficient training data of reasonable quality for making precise predictions. This minireview aims to summarize and scrutinize available mutational datasets commonly used for training predictors. We analyze their structure and discuss the possible directions of improvement in terms of data size, quality, and availability. We also present perspectives on the development of mutational data for accelerating the design of efficient predictors, introducing two new manually curated databases FireProt(DB)and SoluProtMut(DB)for protein stability and solubility, respectively
FireProt(DB): database of manually curated protein stability data
The majority of naturally occurring proteins have evolved to function under mild conditions inside the living organisms. One of the critical obstacles for the use of proteins in biotechnological applications is their insufficient stability at elevated temperatures or in the presence of salts. Since experimental screening for stabilizing mutations is typically laborious and expensive, in silico predictors are often used for narrowing down the mutational landscape. The recent advances in machine learning and artificial intelligence further facilitate the development of such computational tools. However, the accuracy of these predictors strongly depends on the quality and amount of data used for training and testing, which have often been reported as the current bottleneck of the approach. To address this problem, we present a novel database of experimental thermostability data for single-point mutants FireProt(DB). The database combines the published datasets, data extracted manually from the recent literature, and the data collected in our laboratory. Its user interface is designed to facilitate both types of the expected use: (i) the interactive explorations of individual entries on the level of a protein or mutation and (ii) the construction of highly customized and machine learning-friendly datasets using advanced searching and filtering. The database is freely available at https://loschmidt.chemi.muni.cz/fireprotdb
Advanced database mining integrating sequence and structure bioinformatics with microfluidics challenges enzyme engineering
Please click Additional Files below to see the full abstract
PRIMAL-DUAL BLOCK-PROXIMAL SPLITTING FOR A CLASS OF NON-CONVEX PROBLEMS
We develop block structure-adapted primal-dual algorithms for non-convex non-smooth optimisation problems, whose objectives can be written as compositions G(x) + F(K(x)) of non-smooth block-separable convex functions G and F with a nonlinear Lipschitz-differentiable operator K. Our methods are refinements of the nonlinear primal-dual proximal splitting method for such problems without the block structure, which itself is based on the primal-dual proximal splitting method of Chambolle and Pock for convex problems. We propose individual step length parameters and acceleration rules for each of the primal and dual blocks of the problem. This allows them to convergence faster by adapting to the structure of the problem. For the squared distance of the iterates to a critical point, we show local O(1/N), O(1/N-2), and linear rates under varying conditions and choices of the step length parameters. Finally, we demonstrate the performance of the methods for the practical inverse problems of diffusion tensor imaging and electrical impedance tomography.Peer reviewe
Sensitive operation of enzyme-based biodevices by advanced signal processing.
Analytical devices that combine sensitive biological component with a physicochemical detector hold a great potential for various applications, e.g., environmental monitoring, food analysis or medical diagnostics. Continuous efforts to develop inexpensive sensitive biodevices for detecting target substances typically focus on the design of biorecognition elements and their physical implementation, while the methods for processing signals generated by such devices have received far less attention. Here, we present fundamental considerations related to signal processing in biosensor design and investigate how undemanding signal treatment facilitates calibration and operation of enzyme-based biodevices. Our signal treatment approach was thoroughly validated with two model systems: (i) a biodevice for detecting chemical warfare agents and environmental pollutants based on the activity of haloalkane dehalogenase, with the sensitive range for bis(2-chloroethyl) ether of 0.01-0.8 mM and (ii) a biodevice for detecting hazardous pesticides based on the activity of γ-hexachlorocyclohexane dehydrochlorinase with the sensitive range for γ-hexachlorocyclohexane of 0.01-0.3 mM. We demonstrate that the advanced signal processing based on curve fitting enables precise quantification of parameters important for sensitive operation of enzyme-based biodevices, including: (i) automated exclusion of signal regions with substantial noise, (ii) derivation of calibration curves with significantly reduced error, (iii) shortening of the detection time, and (iv) reliable extrapolation of the signal to the initial conditions. The presented simple signal curve fitting supports rational design of optimal system setup by explicit and flexible quantification of its properties and will find a broad use in the development of sensitive and robust biodevices
Exploration of Protein Unfolding by Modelling Calorimetry Data from Reheating
Abstract Studies of protein unfolding mechanisms are critical for understanding protein functions inside cells, de novo protein design as well as defining the role of protein misfolding in neurodegenerative disorders. Calorimetry has proven indispensable in this regard for recording full energetic profiles of protein unfolding and permitting data fitting based on unfolding pathway models. While both kinetic and thermodynamic protein stability are analysed by varying scan rates and reheating, the latter is rarely used in curve-fitting, leading to a significant loss of information from experiments. To extract this information, we propose fitting both first and second scans simultaneously. Four most common single-peak transition models are considered: (i) fully reversible, (ii) fully irreversible, (iii) partially reversible transitions, and (iv) general three-state models. The method is validated using calorimetry data for chicken egg lysozyme, mutated Protein A, three wild-types of haloalkane dehalogenases, and a mutant stabilized by protein engineering. We show that modelling of reheating increases the precision of determination of unfolding mechanisms, free energies, temperatures, and heat capacity differences. Moreover, this modelling indicates whether alternative refolding pathways might occur upon cooling. The Matlab-based data fitting software tool and its user guide are provided as a supplement