64 research outputs found

    Functional site prediction selects correct protein models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The prediction of protein structure can be facilitated by the use of constraints based on a knowledge of functional sites. Without this information it is still possible to predict which residues are likely to be part of a functional site and this information can be used to select model structures from a variety of alternatives that would correspond to a functional protein.</p> <p>Results</p> <p>Using a large collection of protein-like decoy models, a score was devised that selected those with predicted functional site residues that formed a cluster. When tested on a variety of small <it>α</it>/<it>β</it>/<it>α </it>type proteins, including enzymes and non-enzymes, those that corresponded to the native fold were ranked highly. This performance held also for a selection of larger <it>α</it>/<it>β</it>/<it>α </it>proteins that played no part in the development of the method.</p> <p>Conclusion</p> <p>The use of predicted site positions provides a useful filter to discriminate native-like protein models from non-native models. The method can be applied to any collection of models and should provide a useful aid to all modelling methods from <it>ab initio </it>to homology based approaches.</p

    Alignment of Biological Sequences with Jalview

    Get PDF
    In this chapter, we introduce core functionality of the Jalview interactive platform for the creation, analysis, and publication of multiple sequence alignments. A workflow is described based on Jalview's core functions: from data import to figure generation, including import of alignment reliability scores from T-Coffee and use of Jalview from the command line. The accompanying notes provide background information on the underlying methods and discuss additional options for working with Jalview to perform multiple sequence alignment, functional site analysis, and publication of alignments on the web

    Integrating isotopes and documentary evidence : dietary patterns in a late medieval and early modern mining community, Sweden

    Get PDF
    We would like to thank the Archaeological Research Laboratory, Stockholm University, Sweden and the Tandem Laboratory (Ångström Laboratory), Uppsala University, Sweden, for undertaking the analyses of stable nitrogen and carbon isotopes in both human and animal collagen samples. Also, thanks to Elin Ahlin Sundman for providing the δ13C and δ15N values for animal references from Västerås. This research (Bäckström’s PhD employment at Lund University, Sweden) was supported by the Berit Wallenberg Foundation (BWS 2010.0176) and Jakob and Johan Söderberg’s foundation. The ‘Sala project’ (excavations and analyses) has been funded by Riksens Clenodium, Jernkontoret, Birgit and Gad Rausing’s Foundation, SAU’s Research Foundation, the Royal Physiographic Society of Lund, Berit Wallenbergs Foundation, Åke Wibergs Foundation, Lars Hiertas Memory, Helge Ax:son Johnson’s Foundation and The Royal Swedish Academy of Sciences.Peer reviewedPublisher PD

    Predicting active site residue annotations in the Pfam database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family.</p> <p>Description</p> <p>We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and <it>MEROPS </it>we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives.</p> <p>Conclusion</p> <p>We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.</p

    Predicting mostly disordered proteins by using structure-unknown protein data

    Get PDF
    BACKGROUND: Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences. RESULTS: When the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred (long)), its sensitivity was 0.834 for disordered proteins, which is 0.052–0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036–0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5%–10% disordered sequences, 1.46% for the proteins with 10%–20% disordered sequences and 16.57% for proteins with 20%–40% disordered sequences. CONCLUSION: The proposed method, which utilizes the information of structure-unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness

    Mouse mammary stem cells express prognostic markers for triple-negative breast cancer

    Get PDF
    Introduction Triple negative breast cancer (TNBC) is a heterogeneous group of tumours in which chemotherapy, the current mainstay of systemic treatment, is often initially beneficial but with a high risk of relapse and metastasis. There is currently no means of predicting which TNBC will relapse. We tested the hypothesis that the biological properties of normal stem cells are re-activated in tumour metastasis and that, therefore, the activation of normal mammary stem cell-associated gene sets in primary TNBC would be highly prognostic for relapse and metastasis. Methods Mammary basal stem and myoepithelial cells were isolated by flow cytometry and tested in low dose transplant assays. Gene expression microarrays were used to establish expression profiles of the stem and myoepithelial populations; these were compared to each other and to our previously established mammary epithelial gene expression profiles. Stem cell genes were classified by Gene Ontology (GO) analysis and the expression of a subset analysed in the stem cell population at single cell resolution. Activation of stem cell genes was interrogated across different breast cancer cohorts and within specific subtypes and tested for clinical prognostic power. Results A set of 323 genes was identified that was expressed significantly more highly in the purified basal stem cells compared to all other cells of the mammary epithelium. 109 out of 323 genes had been associated with stem cell features in at least one other study in addition to our own, providing further support for their involvement in the biology of this cell type. GO analysis demonstrated an enrichment of these genes for an association with cell migration, cytoskeletal regulation and tissue morphogenesis, consistent with a role in invasion and metastasis. Single cell resolution analysis showed that individual cells co-expressed both epithelial- and mesenchymal-associated genes/proteins. Most strikingly, we demonstrated that strong activity of this stem cell gene set in TNBCs identified those tumours most likely to rapidly progress to metastasis. Conclusions Our findings support the hypothesis that the biological properties of normal stem cells are drivers of metastasis and that these properties can be used to stratify patients with a highly heterogeneous disease such as TNBC

    SiteSeek: Post-translational modification analysis using adaptive locality-effective kernel methods and new profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Post-translational modifications have a substantial influence on the structure and functions of protein. Post-translational phosphorylation is one of the most common modification that occur in intracellular proteins. Accurate prediction of protein phosphorylation sites is of great importance for the understanding of diverse cellular signalling processes in both the human body and in animals. In this study, we propose a new machine learning based protein phosphorylation site predictor, SiteSeek. SiteSeek is trained using a novel compact evolutionary and hydrophobicity profile to detect possible protein phosphorylation sites for a target sequence. The newly proposed method proves to be more accurate and exhibits a much stable predictive performance than currently existing phosphorylation site predictors.</p> <p>Results</p> <p>The performance of the proposed model was compared to nine existing different machine learning models and four widely known phosphorylation site predictors with the newly proposed PS-Benchmark_1 dataset to contrast their accuracy, sensitivity, specificity and correlation coefficient. SiteSeek showed better predictive performance with 86.6% accuracy, 83.8% sensitivity, 92.5% specificity and 0.77 correlation-coefficient on the four main kinase families (CDK, CK2, PKA, and PKC).</p> <p>Conclusion</p> <p>Our newly proposed methods used in SiteSeek were shown to be useful for the identification of protein phosphorylation sites as it performed much better than widely known predictors on the newly built PS-Benchmark_1 dataset.</p

    Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function.</p> <p>Findings</p> <p>We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated.</p> <p>Conclusions</p> <p>CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ~30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: <url>http://cpass.unl.edu</url>.</p

    Latitudinal gradient in dairy production with the introduction of farming in Atlantic Europe

    Get PDF
    International audienceThe introduction of farming had far-reaching impacts on health, social structure and demography. Although the spread of domesticated plants and animals has been extensively tracked, it is unclear how these nascent economies developed within different environmental and cultural settings. Using molecular and isotopic analysis of lipids from pottery, here we investigate the foods prepared by the earliest farming communities of the European Atlantic seaboard. Surprisingly, we find an absence of aquatic foods, including in ceramics from coastal sites, except in the Western Baltic where this tradition continued from indigenous ceramic using hunter-gatherer-fishers. The frequency of dairy products in pottery increased as farming was progressively introduced along a northerly latitudinal gradient. This finding implies that early farming communities needed time to adapt their economic practices before expanding into more northerly areas. Latitudinal differences in the scale of dairy production might also have influenced the evolution of adult lactase persistence across Europe
    corecore