82 research outputs found

    SIRT6 protein deacetylase interacts with MYH DNA glycosylase, APE1 endonuclease, and Rad9-Rad1-Hus1 checkpoint clamp

    Get PDF
    Background: SIRT6, a member of the NAD+-dependent histone/protein deacetylase family, regulates genomic stability, metabolism, and lifespan. MYH glycosylase and APE1 are two base excision repair (BER) enzymes involved in mutation avoidance from oxidative DNA damage. Rad9-Rad1-Hus1 (9-1-1) checkpoint clamp promotes cell cycle checkpoint signaling and DNA repair. BER is coordinated with the checkpoint machinery and requires chromatin remodeling for efficient repair. SIRT6 is involved in DNA double-strand break repair and has been implicated in BER. Here we investigate the direct physical and functional interactions between SIRT6 and BER enzymes. Results: We show that SIRT6 interacts with and stimulates MYH glycosylase and APE1. In addition, SIRT6 interacts with the 9-1-1 checkpoint clamp. These interactions are enhanced following oxidative stress. The interdomain connector of MYH is important for interactions with SIRT6, APE1, and 9-1-1. Mutagenesis studies indicate that SIRT6, APE1, and Hus1 bind overlapping but different sequence motifs on MYH. However, there is no competition of APE1, Hus1, or SIRT6 binding to MYH. Rather, one MYH partner enhances the association of the other two partners to MYH. Moreover, APE1 and Hus1 act together to stabilize the MYH/SIRT6 complex. Within human cells, MYH and SIRT6 are efficiently recruited to confined oxidative DNA damage sites within transcriptionally active chromatin, but not within repressive chromatin. In addition, Myh foci induced by oxidative stress and Sirt6 depletion are frequently localized on mouse telomeres. Conclusions: Although SIRT6, APE1, and 9-1-1 bind to the interdomain connector of MYH, they do not compete for MYH association. Our findings indicate that SIRT6 forms a complex with MYH, APE1, and 9-1-1 to maintain genomic and telomeric integrity in mammalian cells

    Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling

    Get PDF
    The Joint Evolutionary Trees (JET) method detects protein interfaces, the core residues involved in the folding process, and residues susceptible to site-directed mutagenesis and relevant to molecular recognition. The approach, based on the Evolutionary Trace (ET) method, introduces a novel way to treat evolutionary information. Families of homologous sequences are analyzed through a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple alignment and impacts of weakly homologous sequences on distance tree construction. The sampling method makes sequence analysis more sensitive to functional and structural importance of individual residues by avoiding effects of the overrepresentation of highly homologous sequences and improves computational efficiency. A carefully designed clustering method is parametrized on the target structure to detect and extend patches on protein surfaces into predicted interaction sites. Clustering takes into account residues' physical-chemical properties as well as conservation. Large-scale application of JET requires the system to be adjustable for different datasets and to guarantee predictions even if the signal is low. Flexibility was achieved by a careful treatment of the number of retrieved sequences, the amino acid distance between sequences, and the selective thresholds for cluster identification. An iterative version of JET (iJET) that guarantees finding the most likely interface residues is proposed as the appropriate tool for large-scale predictions. Tests are carried out on the Huang database of 62 heterodimer, homodimer, and transient complexes and on 265 interfaces belonging to signal transduction proteins, enzymes, inhibitors, antibodies, antigens, and others. A specific set of proteins chosen for their special functional and structural properties illustrate JET behavior on a large variety of interactions covering proteins, ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf, Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant improvement in performance and computational efficiency is shown

    How accurate and statistically robust are catalytic site predictions based on closeness centrality?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex <it>i </it>and all other vertices.</p> <p>Results</p> <p>We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined.</p> <p>Conclusion</p> <p>Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.</p

    Scalable and accurate deep learning for electronic health records

    Get PDF
    Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient's chart.Comment: Published version from https://www.nature.com/articles/s41746-018-0029-

    Accurate Protein Structure Annotation through Competitive Diffusion of Enzymatic Functions over a Network of Local Evolutionary Similarities

    Get PDF
    High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks

    ResBoost: characterizing and predicting catalytic residues in enzymes

    Get PDF
    Abstract Background Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed. Results We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA). Conclusion ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA

    Identifying allosteric fluctuation transitions between different protein conformational states as applied to Cyclin Dependent Kinase 2

    Get PDF
    BACKGROUND: The mechanisms underlying protein function and associated conformational change are dominated by a series of local entropy fluctuations affecting the global structure yet are mediated by only a few key residues. Transitional Dynamic Analysis (TDA) is a new method to detect these changes in local protein flexibility between different conformations arising from, for example, ligand binding. Additionally, Positional Impact Vertex for Entropy Transfer (PIVET) uses TDA to identify important residue contact changes that have a large impact on global fluctuation. We demonstrate the utility of these methods for Cyclin-dependent kinase 2 (CDK2), a system with crystal structures of this protein in multiple functionally relevant conformations and experimental data revealing the importance of local fluctuation changes for protein function. RESULTS: TDA and PIVET successfully identified select residues that are responsible for conformation specific regional fluctuation in the activation cycle of Cyclin Dependent Kinase 2 (CDK2). The detected local changes in protein flexibility have been experimentally confirmed to be essential for the regulation and function of the kinase. The methodologies also highlighted possible errors in previous molecular dynamic simulations that need to be resolved in order to understand this key player in cell cycle regulation. Finally, the use of entropy compensation as a possible allosteric mechanism for protein function is reported for CDK2. CONCLUSION: The methodologies embodied in TDA and PIVET provide a quick approach to identify local fluctuation change important for protein function and residue contacts that contributes to these changes. Further, these approaches can be used to check for possible errors in protein dynamic simulations and have the potential to facilitate a better understanding of the contribution of entropy to protein allostery and function

    Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 20<sup>3 </sup>different types of <it>3</it>-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare <it>3</it>-grams distinguished by their high entropy-based information content.</p> <p>Results</p> <p>Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit <it>3</it>-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results.</p> <p>Conclusion</p> <p>The results suggest that recruitment of rare <it>3</it>-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.</p
    • …
    corecore