175 research outputs found

    ProDy: Protein Dynamics Inferred from Theory and Experiments

    Get PDF
    Summary: We developed a Python package, ProDy, for structure-based analysis of protein dynamics. ProDy allows for quantitative characterization of structural variations in heterogeneous datasets of structures experimentally resolved for a given biomolecular system, and for comparison of these variations with the theoretically predicted equilibrium dynamics. Datasets include structural ensembles for a given family or subfamily of proteins, their mutants and sequence homologues, in the presence/absence of their substrates, ligands or inhibitors. Numerous helper functions enable comparative analysis of experimental and theoretical data, and visualization of the principal changes in conformations that are accessible in different functional states. ProDy application programming interface (API) has been designed so that users can easily extend the software and implement new methods

    A series of PDB related databases for everyday needs

    Get PDF
    The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design

    SARS-CoV-2 Infects Human Engineered Heart Tissues and Models COVID-19 Myocarditis.

    Get PDF
    There is ongoing debate as to whether cardiac complications of coronavirus disease-2019 (COVID-19) result from myocardial viral infection or are secondary to systemic inflammation and/or thrombosis. We provide evidence that cardiomyocytes are infected in patients with COVID-19 myocarditis and are susceptible to severe acute respiratory syndrome coronavirus 2. We establish an engineered heart tissue model of COVID-19 myocardial pathology, define mechanisms of viral pathogenesis, and demonstrate that cardiomyocyte severe acute respiratory syndrome coronavirus 2 infection results in contractile deficits, cytokine production, sarcomere disassembly, and cell death. These findings implicate direct infection of cardiomyocytes in the pathogenesis of COVID-19 myocardial pathology and provides a model system to study this emerging disease

    Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

    Get PDF
    BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

    Joint Effects of Febrile Acute Infection and an Interferon-γ Polymorphism on Breast Cancer Risk

    Get PDF
    BACKGROUND: There is an inverse relationship between febrile infection and the risk of malignancies. Interferon gamma (IFN-γ) plays an important role in fever induction and its expression increases with incubation at fever-range temperatures. Therefore, the genetic polymorphism of IFN-γ may modify the association of febrile infection with breast cancer risk. METHODOLOGY AND PRINCIPAL FINDINGS: Information on potential breast cancer risk factors, history of fever during the last 10 years, and blood specimens were collected from 839 incident breast cancer cases and 863 age-matched controls between October 2008 and June 2010 in Guangzhou, China. IFN-γ (rs2069705) was genotyped using a matrix-assisted laser desorption/ionization time-of-flight mass spectrometry platform. Odds ratios (OR) and 95% confidence intervals (CIs) were calculated using multivariate logistic regression. We found that women who had experienced ≥1 fever per year had a decreased risk of breast cancer [ORs and 95% CI: 0.77 (0.61-0.99)] compared to those with less than one fever a year. This association only occurred in women with CT/TT genotypes [0.54 (0.37-0.77)] but not in those with the CC genotype [1.09 (0.77-1.55)]. The association of IFN-γ rs2069705 with the risk of breast cancer was not significant among all participants, while the CT/TT genotypes were significantly related to an elevated risk of breast cancer [1.32 (1.03-1.70)] among the women with <1 fever per year and to a reduced risk of breast cancer [0.63 (0.40-0.99)] among women with ≥1 fever per year compared to the CC genotype. A marked interaction between fever frequencies and the IFN-γ genotypes was observed (P for multiplicative and additive interactions were 0.005 and 0.058, respectively). CONCLUSIONS: Our findings indicate a possible link between febrile acute infection and a decreased risk of breast cancer, and this association was modified by IFN-γ rs2069705

    Automatic structure classification of small proteins using random forest

    Get PDF
    <p>Abstract</p> <p><b>Background</b></p> <p>Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs.</p> <p><b>Result</b>s</p> <p>Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP <it>Class, Fold, Super-family </it>or <it>Family </it>levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases.</p> <p>Conclusions</p> <p>The utility of random forest in classifying domains from the place-holder classes of SCOP to the true <it>Class, Fold, Super-family </it>or <it>Family </it>levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.</p

    Random Amino Acid Mutations and Protein Misfolding Lead to Shannon Limit in Sequence-Structure Communication

    Get PDF
    The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials

    Enoxaparin for symptomatic COVID-19 managed in the ambulatory setting: An individual patient level analysis of the OVID and ETHIC trials.

    Get PDF
    BACKGROUND: Antithrombotic treatment may improve the disease course in non-critically ill, symptomatic COVID-19 outpatients. METHODS: We performed an individual patient-level analysis of the OVID and ETHIC randomized controlled trials, which compared enoxaparin thromboprophylaxis for either 14 (OVID) or 21 days (ETHIC) vs. no thromboprophylaxis for outpatients with symptomatic COVID-19 and at least one additional risk factor. The primary efficacy outcome included all-cause hospitalization and all-cause death within 30 days from randomization. Both studies were prematurely stopped for futility. Secondary efficacy outcomes were major symptomatic venous thromboembolic events, arterial cardiovascular events, or their composite occurring within 30 days from randomization. The same outcomes were assessed over a 90-day follow-up. The primary safety outcome was major bleeding (ISTH criteria). RESULTS: A total of 691 patients were randomized: 339 to receive enoxaparin and 352 to the control group. Over 30-day follow-up, the primary efficacy outcome occurred in 6.0 % of patients in the enoxaparin group vs. 5.8 % of controls for a risk ratio (RR) of 1.05 (95%CI 0.57-1.92). The incidence of major symptomatic venous thromboembolic events and arterial cardiovascular events was 0.9 % vs. 1.8 %, respectively (RR 0.52; 95%CI 0.13-2.06). Most cardiovascular thromboembolic events were represented by symptomatic venous thromboembolic events, occurring in 0.6 % vs. 1.5 % of patients, respectively. A similar distribution of outcomes between the treatment groups was observed over 90 days. No major bleeding occurred in the enoxaparin group vs. one (0.3 %) in the control group. CONCLUSIONS: We found no evidence for the clinical benefit of early administration of enoxaparin thromboprophylaxis in outpatients with symptomatic COVID-19. These results should be interpreted taking into consideration the relatively low occurrence of events

    Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan

    Get PDF
    CD4 positive T helper cells control many aspects of specific immunity. These cells are specific for peptides derived from protein antigens and presented by molecules of the extremely polymorphic major histocompatibility complex (MHC) class II system. The identification of peptides that bind to MHC class II molecules is therefore of pivotal importance for rational discovery of immune epitopes. HLA-DR is a prominent example of a human MHC class II. Here, we present a method, NetMHCIIpan, that allows for pan-specific predictions of peptide binding to any HLA-DR molecule of known sequence. The method is derived from a large compilation of quantitative HLA-DR binding events covering 14 of the more than 500 known HLA-DR alleles. Taking both peptide and HLA sequence information into account, the method can generalize and predict peptide binding also for HLA-DR molecules where experimental data is absent. Validation of the method includes identification of endogenously derived HLA class II ligands, cross-validation, leave-one-molecule-out, and binding motif identification for hitherto uncharacterized HLA-DR molecules. The validation shows that the method can successfully predict binding for HLA-DR molecules-even in the absence of specific data for the particular molecule in question. Moreover, when compared to TEPITOPE, currently the only other publicly available prediction method aiming at providing broad HLA-DR allelic coverage, NetMHCIIpan performs equivalently for alleles included in the training of TEPITOPE while outperforming TEPITOPE on novel alleles. We propose that the method can be used to identify those hitherto uncharacterized alleles, which should be addressed experimentally in future updates of the method to cover the polymorphism of HLA-DR most efficiently. We thus conclude that the presented method meets the challenge of keeping up with the MHC polymorphism discovery rate and that it can be used to sample the MHC "space," enabling a highly efficient iterative process for improving MHC class II binding predictions

    Protein structure search and local structure characterization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.</p> <p>Results</p> <p>We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at <url>http://140.113.166.178/safast/</url>.</p> <p>Conclusion</p> <p>The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.</p
    corecore