359 research outputs found

    CORUM: the comprehensive resource of mammalian protein complexes

    Get PDF
    Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes

    Rare coding SNP in DZIP1 gene associated with late-onset sporadic Parkinson's disease

    Get PDF
    We present the first application of the hypothesis-rich mathematical theory to genome-wide association data. The Hamza et al. late-onset sporadic Parkinson's disease genome-wide association study dataset was analyzed. We found a rare, coding, non-synonymous SNP variant in the gene DZIP1 that confers increased susceptibility to Parkinson's disease. The association of DZIP1 with Parkinson's disease is consistent with a Parkinson's disease stem-cell ageing theory.Comment: 14 page

    iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment.</p> <p>Results</p> <p>The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL.</p> <p>The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis.</p> <p>Conclusions</p> <p>The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.</p

    Extending CATH: increasing coverage of the protein structure universe and linking structure with function

    Get PDF
    CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework

    An iterative approach of protein function prediction

    Get PDF
    Background: Current approaches of predicting protein functions from a protein-protein interaction (PPI) dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins) will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins). Therefore, the protein function prediction is a mono-directed and one-off procedure, i.e. from annotated proteins to un-annotated proteins. However, the interactions between proteins are mutual rather than static and mono-directed, although functions of some proteins are unknown for some reasons at present. That means when we use the similarity-based approach to predict functions of un-annotated proteins, the un-annotated proteins, once their functions are predicted, will affect the similarities between proteins, which in turn will affect the prediction results. In other words, the function prediction is a dynamic and mutual procedure. This dynamic feature of protein interactions, however, was not considered in the existing prediction algorithms.Results: In this paper, we propose a new prediction approach that predicts protein functions iteratively. This iterative approach incorporates the dynamic and mutual features of PPI interactions, as well as the local and global semantic influence of protein functions, into the prediction. To guarantee predicting functions iteratively, we propose a new protein similarity from protein functions. We adapt new evaluation metrics to evaluate the prediction quality of our algorithm and other similar algorithms. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions.Conclusions: The iterative approach is more likely to reflect the real biological nature between proteins when predicting functions. A proper definition of protein similarity from protein functions is the key to predicting functions iteratively. The evaluation results demonstrated that in most cases, the iterative approach outperformed non-iterative ones with higher prediction quality in terms of prediction precision, recall and F-value

    Impact of smoking status on the relative efficacy of the EGFR TKI/angiogenesis inhibitor combination therapy in advanced NSCLC-a systematic review and meta-analysis.

    Get PDF
    BACKGROUND The ETOP 10-16 BOOSTER trial failed to demonstrate a progression-free survival (PFS) benefit for adding bevacizumab to osimertinib in second line. An exploratory subgroup analysis, however, suggested a PFS benefit of the combination in patients with a smoking history and prompted us to do this study. METHODS A systematic review and meta-analysis to evaluate the differential effect of smoking status on the benefit of adding an angiogenesis inhibitor to epidermal growth factor receptor (EGFR)-tyrosine kinase inhibitor therapy was carried out. All relevant randomized controlled trials appearing in main oncology congresses or in PubMed as of 1 November 2021 were used according to the Preferred Reporting Items for Systematic Review and Meta-Analyses statement. Primarily PFS according to smoking status, and secondarily overall survival (OS) were of interest. Pooled and interaction hazard ratios (HRs) were estimated by fixed or random effects models, depending on the detected degree of heterogeneity. Bias was assessed using the revised Cochrane tool for randomized controlled trials (RoB 2). RESULTS Information by smoking was available for 1291 patients for PFS (seven studies) and 678 patients for OS (four studies). The risk of bias was low for all studies. Combination treatment significantly prolonged PFS for smokers [n = 502, HR = 0.55, 95% confidence interval (CI): 0.44-0.69] but not for nonsmokers (n = 789, HR = 0.92, 95% CI: 0.66-1.27; treatment-by-smoking interaction P = 0.02). Similarly, a significant OS benefit was found for smokers (n = 271, HR = 0.66, 95% CI: 0.47-0.93) but not for nonsmokers (n = 407, HR = 1.07, 95% CI: 0.82-1.42; treatment-by-smoking interaction P = 0.03). CONCLUSION In advanced EGFR-non-small-cell lung cancer patients, the addition of an angiogenesis inhibitor to EGFR-tyrosine kinase inhibitor therapy provides a statistically significant PFS and OS benefit in smokers, but not in non-smokers. The biological basis for this observation should be pursued and could determine whether this might be due to a specific co-mutational pattern produced by tobacco exposure

    Fungal Virulence and Development Is Regulated by Alternative Pre-mRNA 3′End Processing in Magnaporthe oryzae

    Get PDF
    RNA-binding proteins play a central role in post-transcriptional mechanisms that control gene expression. Identification of novel RNA-binding proteins in fungi is essential to unravel post-transcriptional networks and cellular processes that confer identity to the fungal kingdom. Here, we carried out the functional characterisation of the filamentous fungus-specific RNA-binding protein RBP35 required for full virulence and development in the rice blast fungus. RBP35 contains an N-terminal RNA recognition motif (RRM) and six Arg-Gly-Gly tripeptide repeats. Immunoblots identified two RBP35 protein isoforms that show a steady-state nuclear localisation and bind RNA in vitro. RBP35 coimmunoprecipitates in vivo with Cleavage Factor I (CFI) 25 kDa, a highly conserved protein involved in polyA site recognition and cleavage of pre-mRNAs. Several targets of RBP35 have been identified using transcriptomics including 14-3-3 pre-mRNA, an important integrator of environmental signals. In Magnaporthe oryzae, RBP35 is not essential for viability but regulates the length of 3′UTRs of transcripts with developmental and virulence-associated functions. The Δrbp35 mutant is affected in the TOR (target of rapamycin) signaling pathway showing significant changes in nitrogen metabolism and protein secretion. The lack of clear RBP35 orthologues in yeast, plants and animals indicates that RBP35 is a novel auxiliary protein of the polyadenylation machinery of filamentous fungi. Our data demonstrate that RBP35 is the fungal equivalent of metazoan CFI 68 kDa and suggest the existence of 3′end processing mechanisms exclusive to the fungal kingdom

    SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community.</p> <p>Results</p> <p>SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast <it>Saccharomyces cerevisiae </it>(6,690 sequences).</p> <p>Conclusions</p> <p>Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at <url>http://www.paccanarolab.org/software/scps</url>.</p

    Enrichment of homologs in insignificant BLAST hits by co-complex network alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Homology is a crucial concept in comparative genomics. The algorithm probably most widely used for homology detection in comparative genomics, is BLAST. Usually a stringent score cutoff is applied to distinguish putative homologs from possible false positive hits. As a consequence, some BLAST hits are discarded that are in fact homologous.</p> <p>Results</p> <p>Analogous to the use of the genomics context in genome alignments, we test whether conserved functional context can be used to select candidate homologs from insignificant BLAST hits. We make a co-complex network alignment between complex subunits in yeast and human and find that proteins with an insignificant BLAST hit that are part of homologous complexes, are likely to be homologous themselves. Further analysis of the distant homologs we recovered using the co-complex network alignment, shows that a large majority of these distant homologs are in fact ancient paralogs.</p> <p>Conclusions</p> <p>Our results show that, even though evolution takes place at the sequence and genome level, co-complex networks can be used as circumstantial evidence to improve confidence in the homology of distantly related sequences.</p

    A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The functional characterization of newly discovered proteins has been a challenge in the post-genomic era. Protein-protein interactions provide insights into the functional analysis because the function of unknown proteins can be postulated on the basis of their interaction evidence with known proteins. The protein-protein interaction data sets have been enriched by high-throughput experimental methods. However, the functional analysis using the interaction data has a limitation in accuracy because of the presence of the false positive data experimentally generated and the interactions that are a lack of functional linkage.</p> <p>Results</p> <p>Protein-protein interaction data can be integrated with the functional knowledge existing in the Gene Ontology (GO) database. We apply similarity measures to assess the functional similarity between interacting proteins. We present a probabilistic framework for predicting functions of unknown proteins based on the functional similarity. We use the leave-one-out cross validation to compare the performance. The experimental results demonstrate that our algorithm performs better than other competing methods in terms of prediction accuracy. In particular, it handles the high false positive rates of current interaction data well.</p> <p>Conclusion</p> <p>The experimentally determined protein-protein interactions are erroneous to uncover the functional associations among proteins. The performance of function prediction for uncharacterized proteins can be enhanced by the integration of multiple data sources available.</p
    corecore