663 research outputs found

    SUS-BAR: a database of pig proteins with statistically validated structural and functional annotation

    Get PDF
    Given the relevance of the pig proteome in different studies, including human complex maladies, a statistical validation of the annotation is required for a better understanding of the role of specific genes and proteins in the complex networks underlying biological processes in the animal. Presently, approximately 80% of the pig proteome is still poorly annotated, and the existence of protein sequences is routinely inferred automatically by sequence alignment towards preexisting sequences. In this article, we introduce SUS-BAR, a database that derives information mainly from UniProt Knowledgebase and that includes 26 206 pig protein sequences. In SUS-BAR, 16 675 of the pig protein sequences are endowed with statistically validated functional and structural annotation. Our statistical validation is determined by adopting a cluster-centric annotation procedure that allows transfer of different types of annotation, including structure and function. Each sequence in the database can be associated with a set of statistically validated Gene Ontologies (GOs) of the three main sub-ontologies (Molecular Function, Biological Process and Cellular Component), with Pfam functional domains, and when possible, with a cluster Hidden Markov Model that allows modelling the 3D structure of the protein. A database search allows some statistics demonstrating the enrichment in both GO and Pfam annotations of the pig proteins as compared with UniProt Knowledgebase annotation. Searching in SUS-BAR allows retrieval of the pig protein annotation for further analysis. The search is also possible on the basis of specific GO terms and this allows retrieval of all the pig sequences participating into a given biological process, after annotation with our system. Alternatively, the search is possible on the basis of structural information, allowing retrieval of all the pig sequences with the same structural characteristics

    A clustering method for robust and reliable large scale functional and structural protein sequence annotation

    Get PDF
    Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0

    The human "magnesome": detecting magnesium binding sites on human proteins

    Get PDF
    BACKGROUND: Magnesium research is increasing in molecular medicine due to the relevance of this ion in several important biological processes and associated molecular pathogeneses. It is still difficult to predict from the protein covalent structure whether a human chain is or not involved in magnesium binding. This is mainly due to little information on the structural characteristics of magnesium binding sites in proteins and protein complexes. Magnesium binding features, differently from those of other divalent cations such as calcium and zinc, are elusive. Here we address a question that is relevant in protein annotation: how many human proteins can bind Mg(2+)? Our analysis is performed taking advantage of the recently implemented Bologna Annotation Resource (BAR-PLUS), a non hierarchical clustering method that relies on the pair wise sequence comparison of about 14 millions proteins from over 300.000 species and their grouping into clusters where annotation can safely be inherited after statistical validation. RESULTS: After cluster assignment of the latest version of the human proteome, the total number of human proteins for which we can assign putative Mg binding sites is 3,751. Among these proteins, 2,688 inherit annotation directly from human templates and 1,063 inherit annotation from templates of other organisms. Protein structures are highly conserved inside a given cluster. Transfer of structural properties is possible after alignment of a given sequence with the protein structures that characterise a given cluster as obtained with a Hidden Markov Model (HMM) based procedure. Interestingly a set of 370 human sequences inherit Mg(2+ )binding sites from templates sharing less than 30% sequence identity with the template. CONCLUSION: We describe and deliver the "human magnesome", a set of proteins of the human proteome that inherit putative binding of magnesium ions. With our BAR-hMG, 251 clusters including 1,341 magnesium binding protein structures corresponding to 387 sequences are sufficient to annotate some 13,689 residues in 3,751 human sequences as "magnesium binding". Protein structures act therefore as three dimensional seeds for structural and functional annotation of human sequences. The data base collects specifically all the human proteins that can be annotated according to our procedure as "magnesium binding", the corresponding structures and BAR+ clusters from where they derive the annotation (http://bar.biocomp.unibo.it/mg)

    Protein function annotation using protein domain family resources

    Get PDF
    As a result of the genome sequencing and structural genomics initiatives, we have a wealth of protein sequence and structural data. However, only about 1% of these proteins have experimental functional annotations. As a result, computational approaches that can predict protein functions are essential in bridging this widening annotation gap. This article reviews the current approaches of protein function prediction using structure and sequence based classification of protein domain family resources with a special focus on functional families in the CATH-Gene3D resource

    Graph algorithms for bioinformatics

    Get PDF
    Biological data are inherently interconnected: protein sequences are connected to their annotations, the annotations are structured into ontologies, and so on. While protein-protein interactions are already represented by graphs, in this work I am presenting how a graph structure can be used to enrich the annotation of protein sequences thanks to algorithms that analyze the graph topology. We also describe a novel solution to restrict the data generation needed for building such a graph, thanks to constraints on the data and dynamic programming. The proposed algorithm ideally improves the generation time by a factor of 5. The graph representation is then exploited to build a comprehensive database, thanks to the rising technology of graph databases. While graph databases are widely used for other kind of data, from Twitter tweets to recommendation systems, their application to bioinformatics is new. A graph database is proposed, with a structure that can be easily expanded and queried

    Signaling pathways in cell models of Fabry disease nephropathy

    Get PDF
    Chronic Kidney Disease is a leading cause of morbidity, impaired quality of life and premature death in patients with Fabry disease, being of major public health significance. At the cellular level, besides within lysosomes, glycosphingolipids that accumulate in Fabry disease due to alpha-galactosidase A (α-gal A) deficiency localize to membrane microdomains, which play crucial roles in protein clustering, membrane trafficking, and especially cell signaling. The mechanisms by which increased levels of these glycosphingolipids and consequent changes in microdomain dynamics and lysosomal dysfunction all result in cellular and organ injury are not well understood. To effectively study Fabry disease disease mechanisms at the cellular level, I first established and characterized an epithelial kidney cell model of Fabry disease in Madin-Darby canine kidney (MDCK) cells using small interfering RNA (siRNA). I then examined protein dynamics at the plasma membrane of a model raft-associated protein, GFP-GPI, in this model system. Number and Brightness Analysis in live cells showed a significant increase in the oligomeric size of antibody-induced clusters in α-gal A silenced cells compared to control cells (5.08 ± 0.45 vs 2.74 ± 0.24, respectively). To explore possible consequences of these findings in signaling pathways that are relevant to human disease, I first generated human kidney cell models of Fabry disease in immortalized podocytes and tubule epithelial cells (HK-2) applying the genome editing technique of clustered, regularly interspaced, short palindromic repeats and associated endonuclease 9 from S. pyogenes (CRISPR/Cas9). I compared abundance and phosphorylation of relevant signaling proteins through a high-throughput phosphorylation profiling for Fabry disease and control immortalized human podocytes. Fabry disease podocytes showed significant changes in total protein abundance and/or phosphorylation in 59 proteins. Pathway analysis predicted differential signaling of several canonical pathways in Fabry disease podocytes. These studies provided for the first time an understanding of raft protein dynamics and signaling in kidney cells deficient for α-gal A, potentially opening new avenues for biomarker discovery and drug development for Fabry disease nephropathy

    Variable outcomes of human heart attack recapitulated in genetically diverse mice

    Get PDF
    Clinical variation in patient responses to myocardial infarction (MI) has been difficult to model in laboratory animals. To assess the genetic basis of variation in outcomes after heart attack, we characterized responses to acute MI in the Collaborative Cross (CC), a multi-parental panel of genetically diverse mouse strains. Striking differences in post-MI functional, morphological, and myocardial scar features were detected across 32 CC founder and recombinant inbred strains. Transcriptomic analyses revealed a plausible link between increased intrinsic cardiac oxidative phosphorylation levels and MI-induced heart failure. The emergence of significant quantitative trait loci for several post-MI traits indicates that utilizing CC strains is a valid approach for gene network discovery in cardiovascular disease, enabling more accurate clinical risk assessment and prediction

    Variable outcomes of human heart attack recapitulated in genetically diverse mice.

    Get PDF
    Clinical variation in patient responses to myocardial infarction (MI) has been difficult to model in laboratory animals. To assess the genetic basis of variation in outcomes after heart attack, we characterized responses to acute MI in the Collaborative Cross (CC), a multi-parental panel of genetically diverse mouse strains. Striking differences in post-MI functional, morphological, and myocardial scar features were detected across 32 CC founder and recombinant inbred strains. Transcriptomic analyses revealed a plausible link between increased intrinsic cardiac oxidative phosphorylation levels and MI-induced heart failure. The emergence of significant quantitative trait loci for several post-MI traits indicates that utilizing CC strains is a valid approach for gene network discovery in cardiovascular disease, enabling more accurate clinical risk assessment and prediction

    The Rosetteless gene controls development in the choanoflagellate S. rosetta.

    Get PDF
    The origin of animal multicellularity may be reconstructed by comparing animals with one of their closest living relatives, the choanoflagellate Salpingoeca rosetta. Just as animals develop from a single cell-the zygote-multicellular rosettes of S. rosetta develop from a founding cell. To investigate rosette development, we established forward genetics in S. rosetta. We find that the rosette defect of one mutant, named Rosetteless, maps to a predicted C-type lectin, a class of signaling and adhesion genes required for the development and innate immunity in animals. Rosetteless protein is essential for rosette development and forms an extracellular layer that coats and connects the basal poles of each cell in rosettes. This study provides the first link between genotype and phenotype in choanoflagellates and raises the possibility that a protein with C-type lectin-like domains regulated development in the last common ancestor of choanoflagellates and animals

    Explainable Representations for Relation Prediction in Knowledge Graphs

    Full text link
    Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.Comment: 16 pages, 3 figure
    corecore