165 research outputs found

    Protein subfamily assignment using the Conserved Domain Database

    Get PDF
    © 2008 Fong et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

    Predicting specificity in bZIP coiled-coil protein interactions

    Get PDF
    We present a method for predicting protein-protein interactions mediated by the coiled-coil motif. When tested on interactions between nearly all human and yeast bZIP proteins, our method identifies 70% of strong interactions while maintaining that 92% of predictions are correct. Furthermore, cross-validation testing shows that including the bZIP experimental data significantly improves performance. Our method can be used to predict bZIP interactions in other genomes and is a promising approach for predicting coiled-coil interactions more generally

    An Approximate L^p Difference Algorithm for Massive Data Streams

    Get PDF
    Several recent papers have shown how to approximate the difference ∑ _i|a_i-b_i| or ∑ |a_i-b_i|^2 between two functions, when the function values a_i and b_i are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream. They approximate with small relative error. Using different techniques, we show how to approximate the L^p-difference ∑ _i|a_i-b_i|^p for any rational-valued p∈(0,2], with comparable efficiency and error. We also show how to approximate ∑ _i|a_i-b_i|^p for larger values of p but with a worse error guarantee. Our results fill in gaps left by recent work, by providing an algorithm that is precisely tunable for the application at hand. These results can be used to assess the difference between two chronologically or physically separated massive data sets, making one quick pass over each data set, without buffering the data or requiring the data source to pause. For example, one can use our techniques to judge whether the traffic on two remote network routers are similar without requiring either router to transmit a copy of its traffic. A web search engine could use such algorithms to construct a library of small ''sketches,'' one for each distinct page on the web; one can approximate the extent to which new web pages duplicate old ones by comparing the sketches of the web pages. Such techniques will become increasingly important as the enormous scale, distributional nature, and one-pass processing requirements of data sets become more commonplace

    Better Alternatives to OSPF Routing

    Full text link
    The current standard for intra-domain network routing, Open ShortestPath First (OSPF), suffers from a number ofproblems-the tunable parameters (the weights) are hard tooptimize, the chosen paths are not robust underchanges in traffic or network state, and some network links are over-usedat the expense of others. We present prototypical scenarios that illustrate these problems.Then we propose several variants of a protocol to eliminate oralleviate them and demonstrate the improvements in performance underthose scenarios. We also prove that these protocols never performsignificantly worse than OSPF and show that for at least a limitedclass of network topologies, it is possible to find efficiently theoptimal weight settings. Some of the problems with OSPF are well known; indeed, there areseveral routing protocols that perform better than OSPF in routingquality (i.e., in terms of congestion, delay, etc.). OSPF’spopularity persists in part because of its efficiency with respect toseveral resource bounds. In contrast, many competing protocols thatprovide routing superior to OSPF are computationally prohibitive.Motivated by this consideration, we designed our protocols not only toachieve better routing quality than OSPF, but also to use resources inamount comparable with OSPF with respect to offline broadcastcommunication, size of and time to compute routing tables, packet deliverylatency, and packet header structure and size.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/41349/1/453_2005_Article_1161.pd

    ComSin: database of protein structures in bound (complex) and unbound (single) states in relation to their intrinsic disorder

    Get PDF
    Most of the proteins in a cell assemble into complexes to carry out their function. In this work, we have created a new database (named ComSin) of protein structures in bound (complex) and unbound (single) states to provide a researcher with exhaustive information on structures of the same or homologous proteins in bound and unbound states. From the complete Protein Data Bank (PDB), we selected 24 910 pairs of protein structures in bound and unbound states, and identified regions of intrinsic disorder. For 2448 pairs, the proteins in bound and unbound states are identical, while 7129 pairs have sequence identity 90% or larger. The developed server enables one to search for proteins in bound and unbound states with several options including sequence similarity between the corresponding proteins in bound and unbound states, and validation of interaction interfaces of protein complexes. Besides that, through our web server, one can obtain necessary information for studying disorder-to-order and order-to-disorder transitions upon complex formation, and analyze structural differences between proteins in bound and unbound states. The database is available at http://antares.protres.ru/comsin/

    Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites

    Get PDF
    IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html

    Whole-Genome Sequencing of a Single Proband Together with Linkage Analysis Identifies a Mendelian Disease Gene

    Get PDF
    Although more than 2,400 genes have been shown to contain variants that cause Mendelian disease, there are still several thousand such diseases yet to be molecularly defined. The ability of new whole-genome sequencing technologies to rapidly indentify most of the genetic variants in any given genome opens an exciting opportunity to identify these disease genes. Here we sequenced the whole genome of a single patient with the dominant Mendelian disease, metachondromatosis (OMIM 156250), and used partial linkage data from her small family to focus our search for the responsible variant. In the proband, we identified an 11 bp deletion in exon four of PTPN11, which alters frame, results in premature translation termination, and co-segregates with the phenotype. In a second metachondromatosis family, we confirmed our result by identifying a nonsense mutation in exon 4 of PTPN11 that also co-segregates with the phenotype. Sequencing PTPN11 exon 4 in 469 controls showed no such protein truncating variants, supporting the pathogenicity of these two mutations. This combination of a new technology and a classical genetic approach provides a powerful strategy to discover the genes responsible for unexplained Mendelian disorders

    The Characterization of Twenty Sequenced Human Genomes

    Get PDF
    We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways
    corecore