1,104 research outputs found

    The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

    Get PDF
    Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts

    Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

    Get PDF
    Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies

    Clustering protein environments for function prediction: finding PROSITE motifs in 3D

    Get PDF
    Background: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. Results: We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. Conclusion: Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites

    Network deconvolution as a general method to distinguish direct dependencies in networks

    Get PDF
    Recognizing direct relationships between variables connected in a network is a pervasive problem in biological, social and information sciences as correlation-based networks contain numerous indirect relationships. Here we present a general method for inferring direct effects from an observed correlation matrix containing both direct and indirect effects. We formulate the problem as the inverse of network convolution, and introduce an algorithm that removes the combined effect of all indirect paths of arbitrary length in a closed-form solution by exploiting eigen-decomposition and infinite-series sums. We demonstrate the effectiveness of our approach in several network applications: distinguishing direct targets in gene expression regulatory networks; recognizing directly interacting amino-acid residues for protein structure prediction from sequence alignments; and distinguishing strong collaborations in co-authorship social networks using connectivity information alone. In addition to its theoretical impact as a foundational graph theoretic tool, our results suggest network deconvolution is widely applicable for computing direct dependencies in network science across diverse disciplines.National Institutes of Health (U.S.) (grant R01 HG004037)National Institutes of Health (U.S.) (grant HG005639)Swiss National Science Foundation (Fellowship)National Science Foundation (U.S.) (NSF CAREER Award 0644282

    On the Specificity of Heparin/Heparan Sulfate Binding to Proteins. Anion-Binding Sites on Antithrombin and Thrombin Are Fundamentally Different

    Get PDF
    Background The antithrombinā€“heparin/heparan sulfate (H/HS) and thrombinā€“H/HS interactions are recognized as prototypic specific and non-specific glycosaminoglycan (GAG)ā€“protein interactions, respectively. The fundamental structural basis for the origin of specificity, or lack thereof, in these interactions remains unclear. The availability of multiple co-crystal structures facilitates a structural analysis that challenges the long-held belief that the GAG binding sites in antithrombin and thrombin are essentially similar with high solvent exposure and shallow surface characteristics. Methodology Analyses of solvent accessibility and exposed surface areas, gyrational mobility, symmetry, cavity shape/size, conserved water molecules and crystallographic parameters were performed for 12 X-ray structures, which include 12 thrombin and 16 antithrombin chains. Novel calculations are described for gyrational mobility and prediction of water loci and conservation. Results The solvent accessibilities and gyrational mobilities of arginines and lysines in the binding sites of the two proteins reveal sharp contrasts. The distribution of positive charges shows considerable asymmetry in antithrombin, but substantial symmetry for thrombin. Cavity analyses suggest the presence of a reasonably sized bifurcated cavity in antithrombin that facilitates a firm ā€˜hand-shakeā€™ with H/HS, but with thrombin, a weaker ā€˜high-fiveā€™. Tightly bound water molecules were predicted to be localized in the pentasaccharide binding pocket of antithrombin, but absent in thrombin. Together, these differences in the binding sites explain the major H/HS recognition characteristics of the two prototypic proteins, thus affording an explanation of the specificity of binding. This provides a foundation for understanding specificity of interaction at an atomic level, which will greatly aid the design of natural or synthetic H/HS sequences that target proteins in a specific manner

    Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions

    Get PDF
    Protein interaction networks are an important part of the post-genomic effort to integrate a part-list view of the cell into system-level understanding. Using a set of 11 yeast genomes we show that combining comparative genomics and secondary structure information greatly increases consensus-based prediction of SH3 targets. Benchmarking of our method against positive and negative standards gave 83% accuracy with 26% coverage. The concept of an optimal divergence time for effective comparative genomics studies was analyzed, demonstrating that genomes of species that diverged very recently from Saccharomyces cerevisiae (S. mikatae, S. bayanus, and S. paradoxus), or a long time ago (Neurospora crassa and Schizosaccharomyces pombe), contain less information for accurate prediction of SH3 targets than species within the optimal divergence time proposed. We also show here that intrinsically disordered SH3 domain targets are more probable sites of interaction than equivalent sites within ordered regions. Our findings highlight several novel S. cerevisiae SH3 protein interactions, the value of selection of optimal divergence times in comparative genomics studies, and the importance of intrinsic disorder for protein interactions. Based on our results we propose novel roles for the S. cerevisiae proteins Abp1p in endocytosis and Hse1p in endosome protein sorting

    The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation

    Get PDF
    SeqFEATURE, a tool for protein function annotation, models protein functions described by sequence motifs using a structural representation. The tool shows significantly improved performance over other methods when sequence and structural similarity are low

    Biomedical Discovery Acceleration, with Applications to Craniofacial Development

    Get PDF
    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work

    Current tools for the identification of miRNA genes and their targets

    Get PDF
    The discovery of microRNAs (miRNAs), almost 10 years ago, changed dramatically our perspective on eukaryotic gene expression regulation. However, the broad and important functions of these regulators are only now becoming apparent. The expansion of our catalogue of miRNA genes and the identification of the genes they regulate owe much to the development of sophisticated computational tools that have helped either to focus or interpret experimental assays. In this article, we review the methods for miRNA gene finding and target identification that have been proposed in the last few years. We identify some problems that current approaches have not yet been able to overcome and we offer some perspectives on the next generation of computational methods
    • ā€¦
    corecore