31 research outputs found
Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level
BACKGROUND: The functional repertoire of the human proteome is an incremental collection of functions accomplished by protein domains evolved along the Homo sapiens lineage. Therefore, knowledge on the origin of these functionalities provides a better understanding of the domain and protein evolution in human. The lack of proper comprehension about such origin has impelled us to study the evolutionary origin of human proteome in a unique way as detailed in this study. RESULTS: This study reports a unique approach for understanding the evolution of human proteome by tracing the origin of its constituting domains hierarchically, along the Homo sapiens lineage. The uniqueness of this method lies in subtractive searching of functional and conserved domains in the human proteome resulting in higher efficiency of detecting their origins. From these analyses the nature of protein evolution and trends in domain evolution can be observed in the context of the entire human proteome data. The method adopted here also helps delineate the degree of divergence of functional families occurred during the course of evolution. CONCLUSION: This approach to trace the evolutionary origin of functional domains in the human proteome facilitates better understanding of their functional versatility as well as provides insights into the functionality of hypothetical proteins present in the human proteome. This work elucidates the origin of functional and conserved domains in human proteins, their distribution along the Homo sapiens lineage, occurrence frequency of different domain combinations and proteome-wide patterns of their distribution, providing insights into the evolutionary solution to the increased complexity of the human proteome
DMAPS: a database of multiple alignments for protein structures
The database of multiple alignments for protein structures (DMAPS) provides instant access to pre-computed multiple structure alignments for all protein structure families in the Protein Data Bank (PDB). Protein structure families have been obtained from four distinct classification methods including SCOP, CATH, ENZYME and CE, and multiple structure alignments have been built for all families containing at least three members, using CE-MC software. Currently, multiple structure alignments are available for 3050 SCOP-, 3087 CATH-, 664 ENZYME- and 1707 CE-based families. A web-based query system has been developed to retrieve multiple alignments for these families using the PDB chain ID of any member of a family. Multiple alignments can be viewed or downloaded in six different formats, including JOY/html, TEXT, FASTA, PDB (superimposed coordinates), JOY/postscript and JOY/rtf. DMAPS is accessible online at
The Product Guides the Process: Discovering Disease Mechanisms
The nature of the product to be discovered guides the reasoning to discover it. Biologists and medical researchers often search for mechanisms. The "new mechanistic philosophy of science" provides resources about the nature of biological mechanisms that aid the discovery of mechanisms. Here, we apply these resources to the discovery of mechanisms in medicine. A new diagrammatic representation of a disease mechanism chain indicates both what is known and, most significantly, what is not known at a given time, thereby guiding the researcher and collaborators in discovery. Mechanisms of genetic diseases provide the examples
A Top-Down Approach to Infer and Compare Domain-Domain Interactions across Eight Model Organisms
Knowledge of specific domain-domain interactions (DDIs) is essential to understand the functional significance of protein interaction networks. Despite the availability of an enormous amount of data on protein-protein interactions (PPIs), very little is known about specific DDIs occurring in them. Here, we present a top-down approach to accurately infer functionally relevant DDIs from PPI data. We created a comprehensive, non-redundant dataset of 209,165 experimentally-derived PPIs by combining datasets from five major interaction databases. We introduced an integrated scoring system that uses a novel combination of a set of five orthogonal scoring features covering the probabilistic, evolutionary, evidence-based, spatial and functional properties of interacting domains, which can map the interacting propensity of two domains in many dimensions. This method outperforms similar existing methods both in the accuracy of prediction and in the coverage of domain interaction space. We predicted a set of 52,492 high-confidence DDIs to carry out cross-species comparison of DDI conservation in eight model species including human, mouse, Drosophila, C. elegans, yeast, Plasmodium, E. coli and Arabidopsis. Our results show that only 23% of these DDIs are conserved in at least two species and only 3.8% in at least 4 species, indicating a rather low conservation across species. Pair-wise analysis of DDI conservation revealed a ‘sliding conservation’ pattern between the evolutionarily neighboring species. Our methodology and the high-confidence DDI predictions generated in this study can help to better understand the functional significance of PPIs at the modular level, thus can significantly impact further experimental investigations in systems biology research
The Product Guides the Process: Discovering Disease Mechanisms
The nature of the product to be discovered guides the reasoning to discover it. Biologists and medical researchers often search for mechanisms. The "new mechanistic philosophy of science" provides resources about the nature of biological mechanisms that aid the discovery of mechanisms. Here, we apply these resources to the discovery of mechanisms in medicine. A new diagrammatic representation of a disease mechanism chain indicates both what is known and, most significantly, what is not known at a given time, thereby guiding the researcher and collaborators in discovery. Mechanisms of genetic diseases provide the examples
Insights from GWAS: emerging landscape of mechanisms underlying complex trait disease
There are now over 2000 loci in the human genome where genome wide association studies (GWAS) have found one or more SNPs to be associated with altered risk of a complex trait disease. At each of these loci, there must be some molecular level mechanism relevant to the disease. What are these mechanisms and how do they contribute to disease? Here we consider the roles of three primary mechanism classes: changes that directly alter protein function (missense SNPs), changes that alter transcript abundance as a consequence of variants close-by in sequence, and changes that affect splicing. Missense SNPs are divided into those predicted to have a high impact on in vivo protein function, and those with a low impact. Splicing is divided into SNPs with a direct impact on splice sites, and those with a predicted effect on auxiliary splicing signals. The analysis was based on associations found for seven complex trait diseases in the classic Wellcome Trust Case Control Consortium (WTCCC1) GWA study and subsequent studies and meta-analyses, collected from the GWAS catalog. Linkage disequilibrium information was used to identify possible candidate SNPs for involvement in disease mechanism in each of the 356 loci associated with these seven diseases. With the parameters used, we find that 76% of loci have at least of these mechanisms. Overall, except for the low incidence of direct impact on splice sites, the mechanisms are found at similar frequencies, with changes in transcript abundance the most common. But the distribution of mechanisms over diseases varies markedly, as does the fraction of loci with assigned mechanisms. Many of the implicated proteins have previously been suggested as relevant, but the specific mechanism assignments are new. In addition, a number of new disease relevant proteins are proposed. The high fraction of GWAS loci with proposed mechanisms suggests that these classes of mechanism play a major role. Other mechanism types, such as variants affecting expression of genes remote in the DNA sequence, will contribute in other loci. Each of the identified putative mechanisms provides a hypothesis for further investigation.https://doi.org/10.1186/1471-2164-16-S8-S
Harnessing formal concepts of biological mechanism to analyze human disease.
Mechanism is a widely used concept in biology. In 2017, more than 10% of PubMed abstracts used the term. Therefore, searching for and reasoning about mechanisms is fundamental to much of biomedical research, but until now there has been almost no computational infrastructure for this purpose. Recent work in the philosophy of science has explored the central role that the search for mechanistic accounts of biological phenomena plays in biomedical research, providing a conceptual basis for representing and analyzing biological mechanism. The foundational categories for components of mechanisms-entities and activities-guide the development of general, abstract types of biological mechanism parts. Building on that analysis, we have developed a formal framework for describing and representing biological mechanism, MecCog, and applied it to describing mechanisms underlying human genetic disease. Mechanisms are depicted using a graphical notation. Key features are assignment of mechanism components to stages of biological organization and classes; visual representation of uncertainty, ignorance, and ambiguity; and tight integration with literature sources. The MecCog framework facilitates analysis of many aspects of disease mechanism, including the prioritization of future experiments, probing of gene-drug and gene-environment interactions, identification of possible new drug targets, personalized drug choice, analysis of nonlinear interactions between relevant genetic loci, and classification of diseases based on mechanism