245 research outputs found

    Family classification without domain chaining

    Get PDF
    Motivation: Classification of gene and protein sequences into homologous families, i.e. sets of sequences that share common ancestry, is an essential step in comparative genomic analyses. This is typically achieved by construction of a sequence homology network, followed by clustering to identify dense subgraphs corresponding to families. Accurate classification of single domain families is now within reach due to major algorithmic advances in remote homology detection and graph clustering. However, classification of multidomain families remains a significant challenge. The presence of the same domain in sequences that do not share common ancestry introduces false edges in the homology network that link unrelated families and stymy clustering algorithms

    DAhunter: a web-based server that identifies homologous proteins by comparing domain architecture

    Get PDF
    We present DAhunter, a web-based server that identifies homologous proteins by comparing domain architectures, the organization of protein domains. A major obstacle in comparison of domain architecture is the existence of ‘promiscuous’ domains, which carry out auxiliary functions and appear in many unrelated proteins. To distinguish these promiscuous domains from protein domains, we assigned a weight score to each domain extracted from RefSeq proteins, based on its abundance and versatility. A domain's score represents its importance in the ‘protein world’ and is used in the comparison of domain architectures. In scoring domains, DAhunter also considers domain combinations as well as single domains. To measure the similarity of two domain architectures, we developed several methods that are based on algorithms used in information retrieval (the cosine similarity, the Goodman–Kruskal γ function, and domain duplication index) and then combined these into a similarity score. Compared with other domain architecture algorithms, DAhunter is better at identifying homology. The server is available at http://www.dahunter.kr and http://localodom.kobic.re.kr/dahunter/index.ht

    d-Omix: a mixer of generic protein domain analysis tools

    Get PDF
    Domain combination provides important clues to the roles of protein domains in protein function, interaction and evolution. We have developed a web server d-Omix (a Mixer of Protein Domain Analysis Tools) aiming as a unified platform to analyze, compare and visualize protein data sets in various aspects of protein domain combinations. With InterProScan files for protein sets of interest provided by users, the server incorporates four services for domain analyses. First, it constructs protein phylogenetic tree based on a distance matrix calculated from protein domain architectures (DAs), allowing the comparison with a sequence-based tree. Second, it calculates and visualizes the versatility, abundance and co-presence of protein domains via a domain graph. Third, it compares the similarity of proteins based on DA alignment. Fourth, it builds a putative protein network derived from domain–domain interactions from DOMINE. Users may select a variety of input data files and flexibly choose domain search tools (e.g. hmmpfam, superfamily) for a specific analysis. Results from the d-Omix could be interactively explored and exported into various formats such as SVG, JPG, BMP and CSV. Users with only protein sequences could prepare an InterProScan file using a service provided by the server as well. The d-Omix web server is freely available at http://www.biotec.or.th/isl/Domix

    The Dynamics and Evolutionary Potential of Domain Loss and Emergence

    Get PDF
    The wealth of available genomic data presents an unrivaled opportunity to study the molecular basis of evolution. Studies on gene family expansions and site-dependent analyses have already helped establish important insights into how proteins facilitate adaptation. However, efforts to conduct full-scale cross-genomic comparisons between species are challenged by both growing amounts of data and the inherent difficulty in accurately inferring homology between deeply rooted species. Proteins, in comparison, evolve by means of domain rearrangements, a process more amenable to study given the strength of profile-based homology inference and the lower rates with which rearrangements occur. However, adapting to a constantly changing environment can require molecular modulations beyond reach of rearrangement alone. Here, we explore rates and functional implications of novel domain emergence in contrast to domain gain and loss in 20 arthropod species of the pancrustacean clade. Emerging domains are more likely disordered in structure and spread more rapidly within their genomes than established domains. Furthermore, although domain turnover occurs at lower rates than gene family turnover, we find strong evidence that the emergence of novel domains is foremost associated with environmental adaptation such as abiotic stress response. The results presented here illustrate the simplicity with which domain-based analyses can unravel key players of nature's adaptational machinery, complementing the classical site-based analyses of adaptation
    corecore