14 research outputs found
Deep learning for reconstructing protein structures from cryo-EM density maps: recent advances and future directions
Cryo-Electron Microscopy (cryo-EM) has emerged as a key technology to
determine the structure of proteins, particularly large protein complexes and
assemblies in recent years. A key challenge in cryo-EM data analysis is to
automatically reconstruct accurate protein structures from cryo-EM density
maps. In this review, we briefly overview various deep learning methods for
building protein structures from cryo-EM density maps, analyze their impact,
and discuss the challenges of preparing high-quality data sets for training
deep learning models. Looking into the future, more advanced deep learning
models of effectively integrating cryo-EM data with other sources of
complementary data such as protein sequences and AlphaFold-predicted structures
need to be developed to further advance the field
Impact of AlphaFold on Structure Prediction of Protein Complexes: The CASP15-CAPRI Experiment
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homo-dimers, 3 homo-trimers, 13 hetero-dimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their 5 best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% for the targets compared to 8% two years earlier, a remarkable improvement resulting from the wide use of the AlphaFold2 and AlphaFold-Multimer software. Creative use was made of the deep learning inference engines affording the sampling of a much larger number of models and enriching the multiple sequence alignments with sequences from various sources. Wide use was also made of the AlphaFold confidence metrics to rank models, permitting top performing groups to exceed the results of the public AlphaFold-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem
Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem
Improving Protein–Ligand Interaction Modeling with cryo-EM Data, Templates, and Deep Learning in 2021 Ligand Model Challenge
Elucidating protein–ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein–ligand interaction is traditionally tackled by molecular docking and simulation, which is based on physical forces and statistical potentials and cannot effectively leverage cryo-EM data and existing protein structural information in the protein–ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein–ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein–ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. These results demonstrate that the deep learning bioinformatics approach is a promising direction for modeling protein–ligand interactions on cryo-EM data using prior structural information
Consignment stock policy in an integrated vendor-buyer model for deteriorating item with stock dependent demand under buyer’s space limitation
In this paper, a single-vendor single-buyer integrated inventory model for a deteriorating item with consignment stock policy is developed, assuming that the market demand is stock dependent and there is space limitation on the buyer’s storage capacity. Both equal and unequal shipments from the vendor to the buyer are considered. The effects of the buyer’s space capacity on the average cost, shipment size, and production batch are studied through numerical example. It is deduced that production rate is the key factor to determine whether to use equal or unequal shipment strategy. Sensitivity analysis is carried out to establish the robustness of the solutions of the models developed
DRLComplex: Reconstruction of protein quaternary structures using deep reinforcement learning
Predicted inter-chain residue-residue contacts can be used to build the
quaternary structure of protein complexes from scratch. However, only a small
number of methods have been developed to reconstruct protein quaternary
structures using predicted inter-chain contacts. Here, we present an
agent-based self-learning method based on deep reinforcement learning
(DRLComplex) to build protein complex structures using inter-chain contacts as
distance constraints. We rigorously tested DRLComplex on two standard datasets
of homodimeric and heterodimeric protein complexes (i.e., the CASP-CAPRI
homodimer and Std_32 heterodimer datasets) using both true and predicted
interchain contacts as inputs. Utilizing true contacts as input, DRLComplex
achieved high average TM-scores of 0.9895 and 0.9881 and a low average
interface RMSD (I_RMSD) of 0.2197 and 0.92 on the two datasets, respectively.
When predicted contacts are used, the method achieves TM-scores of 0.73 and
0.76 for homodimers and heterodimers, respectively. Our experiments find that
the accuracy of reconstructed quaternary structures depends on the accuracy of
the contact predictions. Compared to other optimization methods for
reconstructing quaternary structures from inter-chain contacts, DRLComplex
performs similar to an advanced gradient descent method and better than a
Markov Chain Monte Carlo simulation method and a simulated annealing-based
method, validating the effectiveness of DRLComplex for quaternary
reconstruction of protein complexes.Comment: 20 pages, 8 figures, 12 tables. Under revie
Distribution of Microplastic Contamination in Sapta-Gandaki River System, Nepal
Microplastic (MP) contamination has been reported in many Rivers worldwide. However, there is an increasing concern regarding data quality, particularly in the studies that do not account for positive and negative controls. Additionally, spatiotemporal distribution of MP in transboundary Himalayan River is underexplored. Here, we report spatiotemporal distribution of MP in the second largest river of Nepal; Sapta-Gandaki River system which is 810 km long starting from Himalayan headstream to the Ganges with a catchment area of 46,300 km^2. A total of 120 integrated water samples were collected in pre and post monsoons from 30 sites (2850-140 masl) along three tributaries of Saptagandaki River. The MP data were corrected for procedural blanks (n=23) and positive controls (n=18). We found that the MPs count (cut off size ≥30μm) in pre (dry) monsoon time was significantly higher (61.2±27.8 MP/L, p<0.01) than in post monsoon (winter) time (24.7±10.8 MP/L). High count was observed in the sites near major cities and highways. A gradual increase in MPs count was observed as the River stretches up to downstream (r=-0.6). The shape, size, and color dominance were fragments>pellets>fibers, 30-100>100-250>250-500>500-5000µm, blue>black>transparent; respectively. Most MP particles consisted of polyethylene terephthalate, cellophane, polyethylene, polyvinyl chloride type material. Annual flux discharge calculation showed that Saptagandaki River discharges 0.7×10^8 MP/s. The findings of this study provide baseline data for MPs contamination in one of the major Himalayan River water systems of Nepal and the data could be useful to identify potential control measures