87 research outputs found

    Assessing functional novelty of PSI structures via structure-function analysis of large and diverse superfamilies

    Get PDF
    The structural genomics initiatives have had as one of their aims to improve our understanding of protein function by providing representative structures for many structurally uncharacterised protein families. As suggested by the recent assessment of the Protein Structure Initiative (Structural Genomics Initiative, funded by the NIH), doubts have arisen as to whether Structural Genomics as initially planned were really beneficial to our understanding of biological issues, and in particular of protein function.
A few protein domain superfamilies have been shown to account for unexpectedly large numbers of proteins encoded in fully sequenced genomes. These large superfamilies are generally very diverse, spanning a wide range of functions, both in terms of molecular activities and biological processes. Some of these superfamilies, such as the Rossmann-fold P-loop nucleotide hydrolases or the TIM-barrel glycosidases, have been the subject of extensive structural studies which in turn have shed light on how evolution of the sequence and structure properties produce functional diversity amongst homologues. Recently, the Structure-Function Linkage Database (SFLD) has been setup with the aim of helping the study of structure-function correlations in such superfamilies. Since the evolutionary success of these large superfamilies suggests biological importance, several Structural Genomics Centers have focused on providing full structural coverage for representatives of all sequence families in these superfamilies.
In this work we evaluate structure/function diversity in a set of these large superfamilies and attempt to assess the quality and quantity of biological information gained from Structural Genomics.
&#xa

    CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures

    Get PDF
    We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure–based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification

    Outlier detection of vital sign trajectories from COVID-19 patients

    Full text link
    There is growing interest in continuous wearable vital sign sensors for monitoring patients remotely at home. These monitors are usually coupled to an alerting system, which is triggered when vital sign measurements fall outside a predefined normal range. Trends in vital signs, such as an increasing heart rate, are often indicative of deteriorating health, but are rarely incorporated into alerting systems. In this work, we present a novel outlier detection algorithm to identify such abnormal vital sign trends. We introduce a distance-based measure to compare vital sign trajectories. For each patient in our dataset, we split vital sign time series into 180 minute, non-overlapping epochs. We then calculated a distance between all pairs of epochs using the dynamic time warp distance. Each epoch was characterized by its mean pairwise distance (average link distance) to all other epochs, with large distances considered as outliers. We applied this method to a pilot dataset collected over 1561 patient-hours from 8 patients who had recently been discharged from hospital after contracting COVID-19. We show that outlier epochs correspond well with patients who were subsequently readmitted to hospital. We also show, descriptively, how epochs transition from normal to abnormal for one such patient.Comment: 4 pages, 4 figures, 1 table. Submitted to IEEE BHI 2022, decision pendin

    Protein function annotation by homology-based inference

    Get PDF
    Where information on homologous proteins is available, progress is being made in automated prediction of protein function from sequence and structure

    The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

    Get PDF
    The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43 229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616 470 domain sequences classified into 23 876 sequence families. This results in the significant expansion of the CATHHMMmodel library to include models built from the CATH sequence relatives, giving a10%increase in coveragefor detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned

    FLORA: a novel method to predict protein function from structure in diverse superfamilies

    Get PDF
    Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

    The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

    Get PDF
    We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt

    Gene3D: merging structure and function for a Thousand genomes

    Get PDF
    Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10 000 000 proteins. A hidden Markov model library, constructed from the manually curated CATH structural domain hierarchy, is used to search UniProt, RefSeq and Ensembl protein sequences. The resulting matches are refined into simple multi-domain architectures using a recently developed in-house algorithm, DomainFinder 3 (available at: ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/). The domain assignments are integrated with multiple external protein function descriptions (e.g. Gene Ontology and KEGG), structural annotations (e.g. coiled coils, disordered regions and sequence polymorphisms) and family resources (e.g. Pfam and eggNog) and displayed on the Gene3D website. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. Gene3D also provides a set of services, including an interactive genome coverage graph visualizer, DAS annotation resources, sequence search facilities and SOAP services

    Human-based approaches to pharmacology and cardiology: an interdisciplinary and intersectorial workshop.

    Get PDF
    Both biomedical research and clinical practice rely on complex datasets for the physiological and genetic characterization of human hearts in health and disease. Given the complexity and variety of approaches and recordings, there is now growing recognition of the need to embed computational methods in cardiovascular medicine and science for analysis, integration and prediction. This paper describes a Workshop on Computational Cardiovascular Science that created an international, interdisciplinary and inter-sectorial forum to define the next steps for a human-based approach to disease supported by computational methodologies. The main ideas highlighted were (i) a shift towards human-based methodologies, spurred by advances in new in silico, in vivo, in vitro, and ex vivo techniques and the increasing acknowledgement of the limitations of animal models. (ii) Computational approaches complement, expand, bridge, and integrate in vitro, in vivo, and ex vivo experimental and clinical data and methods, and as such they are an integral part of human-based methodologies in pharmacology and medicine. (iii) The effective implementation of multi- and interdisciplinary approaches, teams, and training combining and integrating computational methods with experimental and clinical approaches across academia, industry, and healthcare settings is a priority. (iv) The human-based cross-disciplinary approach requires experts in specific methodologies and domains, who also have the capacity to communicate and collaborate across disciplines and cross-sector environments. (v) This new translational domain for human-based cardiology and pharmacology requires new partnerships supported financially and institutionally across sectors. Institutional, organizational, and social barriers must be identified, understood and overcome in each specific setting

    Calibration of ionic and cellular cardiac electrophysiology models

    Get PDF
    © 2020 The Authors. WIREs Systems Biology and Medicine published by Wiley Periodicals, Inc. Cardiac electrophysiology models are among the most mature and well-studied mathematical models of biological systems. This maturity is bringing new challenges as models are being used increasingly to make quantitative rather than qualitative predictions. As such, calibrating the parameters within ion current and action potential (AP) models to experimental data sets is a crucial step in constructing a predictive model. This review highlights some of the fundamental concepts in cardiac model calibration and is intended to be readily understood by computational and mathematical modelers working in other fields of biology. We discuss the classic and latest approaches to calibration in the electrophysiology field, at both the ion channel and cellular AP scales. We end with a discussion of the many challenges that work to date has raised and the need for reproducible descriptions of the calibration process to enable models to be recalibrated to new data sets and built upon for new studies. This article is categorized under: Analytical and Computational Methods > Computational Methods Physiology > Mammalian Physiology in Health and Disease Models of Systems Properties and Processes > Cellular Models
    corecore