25 research outputs found

    Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning

    Full text link
    Protein-DNA interaction is critical for life activities such as replication, transcription, and splicing. Identifying protein-DNA binding residues is essential for modeling their interaction and downstream studies. However, developing accurate and efficient computational methods for this task remains challenging. Improvements in this area have the potential to drive novel applications in biotechnology and drug design. In this study, we propose a novel approach called CLAPE, which combines a pre-trained protein language model and the contrastive learning method to predict DNA binding residues. We trained the CLAPE-DB model on the protein-DNA binding sites dataset and evaluated the model performance and generalization ability through various experiments. The results showed that the AUC values of the CLAPE-DB model on the two benchmark datasets reached 0.871 and 0.881, respectively, indicating superior performance compared to other existing models. CLAPE-DB showed better generalization ability and was specific to DNA-binding sites. In addition, we trained CLAPE on different protein-ligand binding sites datasets, demonstrating that CLAPE is a general framework for binding sites prediction. To facilitate the scientific community, the benchmark datasets and codes are freely available at https://github.com/YAndrewL/clape

    Enzymes in the World of Computational Chemistry - Computational Studies on Structures, Dynamics, Mechanisms, and Specificity of Five Biologically Important Enzymes

    Get PDF
    The thesis demonstrates computational chemical methodology applied to some biologically important enzymes. Five enzymes have been studied, i.e. sortase A (SrtA), 2-Methyl-3-hydroxypyridine-5-carboxylic acid oxygenase (MHPCO), porphobilinogen synthase (PBGS), D-glucarate dehydratase (GlucD) and oxidosqualene-lanosterol cyclase (OSC). Different levels of theory have been used to address questions related to structure, function, dynamics, mechanism, specificity, and inhibitor design of the above enzymes. The sortase A (SrtA) enzyme, which catalyzes the peptidoglycan cell wall anchoring reaction of LPXTG surface proteins, has been proposed to be a universal target for therapeutic agents against Gram-positive bacteria. The structure of the L. monocytogenes SrtA enzyme-substrate complex was obtained using homology modeling, molecular docking and molecular dynamics simulations. The active site arginine (Arg 197) was found to be able change its hydrogen donor interactions from the LP backbone carbonyl groups of the LPXTG substrate in the inactive form, to the TG backbone carbonyls in the active form. Similar motion of Arg197 was also observed in the S. aureus SrtA system. The catalytic mechanism of S. aureus SrtA was then systematically studied using MD simulations, ONIOM(DFT:MM) calculations, and QM/MM charge deletion analysis. The catalytic roles of Arg197 and Thr183 were analyzed. Our calculations show that Arg197 has several important roles in the mechanism. It is crucial for substrate binding, and is capable of reversible shift of its hydrogen bonds between the LP and TG carbonyls of the LPXTG substrate motif, depending on the protonation state of the catalytic Cys184-His120 dyad. Arg197 stabilizes the catalytic dyad in the active ion pair form but at the same time raises the barrier to acylation by approximately 8 kcal/mol. Thr183 is also essential for the catalytic reaction in that it correspondingly lowers the barrier by the same amount via electrostatic interactions. The catalytic mechanism proceeds via proton transfer from His120, followed by nucleophilic attack from the thiolate anion of Cys184. The data thus supports the proposed reverse protonation mechanism (RPM), and disproves the hypothesis of the Arg197 generating an oxyanion hole to stabilize the tetrahedral intermediate of the reaction. MHPCO catalyzes the hydroxylation and subsequent ring-opening of the aromatic substrate MHPC to give the aliphatic product R-(N acetylaminomethylene)succinic acid (AAMS), which is the essential ring-opening step in the bacterial degradation of vitamin B6. MHPCO belongs to the flavincontaining aromatic hydroxylases family. However, MHPCO is capable of catalyzing a subsequent aromatic ring-cleavage reaction to give acyclic products rather than hydroxylated aromatic ones. The catalytic mechanism of MHPCO has been systematically studied using DFT (MPWB1K and B3LYP) and ONIOM(DFT:MM) methods. Our DFT calculations show that the rearomatization of the hydroxylated intermediate occurs spontaneously in aqueous solution; this implies that the ring-opening process occurs inside the active site, in which limited water is available. The instability of the hydroxylated intermediate of MHPCO is the main reason why acyclic products are formed. Previously proposed mechanisms for the ring-opening step were studied, and were shown to be less likely to occur. Two new pathways (pathway A and B) with reasonable barrier heights are reported herein. Both DFT and ONIOM calculations show that the ring-opening pathway B, in which an epoxy transition state is formed, is more favored than the direct C2-C3 cleavage pathway A. Our calculations show that the active-site residues Arg211 and Tyr223 have a minor effect on the reaction, while the peptide bond of Pro295-Ala296, the side chain of Tyr82 and several crystal water molecules affect the reaction energy profile considerably. Different QM/MM partitioning schemes have been used to study the enzymatic reaction, and the results show that both the reaction barriers for the hydroxylation and the ring-opening pathways are sensitive to the QM/MM partitioning. Porphobilinogen synthase (PBGS) catalyzes the asymmetric condensation and cyclization of two 5-aminolevulinic acid (5-ALA) substrate molecules to give porphobilinogen (PBG), and is known as the first common step in the biosynthesis of the tetrapyrroles. The chemical step of PBGS is herein revisited using QM/MM (ONIOM) calculations. Two different protonation states and several different mechanisms are considered. Previous mechanisms based on DFT-only calculations are shown unlikely to occur. According to these new calculations, the deprotonation step rather than ring closure is rate-limiting. Both the C-C bond formation first mechanism and the C-N bond formation first mechanism are possible, depending on how the A-site ALA binds to the enzyme. We furthermore propose that future work should focus on the substrate binding step rather than the enzymatic mechanism. D-glucarate dehydratase (GlucD) catalyzes the dehydration of D-glucarate or L-idarate to give 5-keto-4-deoxy-D-glucarate (5-KDG). The stereo-specificity of GlucD is explored by combined docking and QM/MM calculations. According to our calculations, both the substrate binding and the chemical steps of GlucD contribute to substrate specificity. The current approach will be used for assisting enzyme function assignment. Oxidosqualene-lanosterol cyclase (OSC) is a key enzyme in the biosynthesis of cholesterol. The catalytic mechanism and the product specificity of OSC have been studied by using QM/MM calculations. According to our calculations, the protonation of the epoxide ring of oxidosqualene is rate-limiting. The wild type OSC (which generates lanosterol), H232S (which generates parkeol) and H232T (which generates protosta-12,24-dien-3-beta-ol) mutants were modeled, in order to explain the product specificity thereof. We show that the product specificity of OSC at the hydride/methyl-shifting stage is unlikely to be achieved by the stabilization of the cationic intermediates, because the precursor of lanosterol is not the most stable cationic intermediate for the wild type OSC. The energy barriers for the product-determining conversions are related to the product specificity of different OSC mutants. We thus suggest that the product specificity of OSC is likely to be controlled by kinetics, rather than thermodynamics

    Leveraging structure for enzyme function prediction: methods, opportunities, and challenges

    No full text
    The rapid growth of the number of protein sequences that can be inferred from sequenced genomes presents challenges for function assignment, because only a small fraction (currently <1%) has been experimentally characterized. Bioinformatics tools are commonly used to predict functions of uncharacterized proteins. Recently, there has been significant progress in using protein structures as an additional source of information to infer aspects of enzyme function, which is the focus of this review. Successful application of these approaches has led to the identification of novel metabolites, enzyme activities, and biochemical pathways. We discuss opportunities to elucidate systematically protein domains of unknown function, orphan enzyme activities, dead-end metabolites, and pathways in secondary metabolism

    Barringtonia racemosa Blume

    No full text
    原著和名: サガリバナ科名: サガリバナ科 = Lecythidaceae採集地: 台湾 恒春熱帯植物園 (台湾省 恒春熱帯植物園)採集日: 1968/8/9採集者: 萩庭丈壽整理番号: JH007629国立科学博物館整理番号: TNS-VS-95762

    The iGen algorithm.

    No full text
    <p>(a) Schematic overview. Red modules can be run in parallel on multiple computer cores. (b) Reaction types applied to the carbocations, obtained from mechanistic studies of terpenoid synthases.</p

    All monoterpene skeletons identified by iGen.

    No full text
    <p>Red skeletons have products associated with EC numbers.</p

    Example monoterpene compounds, their carbocation precursors, and skeletons.

    No full text
    <p>Product precursor carbocations are quenched by phosphorylation, deprotonation, or hydration to yield products.</p
    corecore