10,899 research outputs found

    A Modified Amino Acid Network Model Contains Similar and Dissimilar Weight

    Get PDF
    For a more detailed description of the interaction between residues, this paper proposes an amino acid network model, which contains two types of weight—similar weight and dissimilar weight. The weight of the link is based on a self-consistent statistical contact potential between different types of amino acids. In this model, we can get a more reasonable representation of the distance between residues. Furthermore, with the network parameter, average shortest path length, we can get a more accurate reflection of the molecular size. This amino acid network is a “small-world” network, and the network parameter is sensitive to the conformation change of protein. For some disease-related proteins, the highly central residues of the amino acid network are highly correlated with the hot spots. In the compound with the related drug, these residues either interacted directly with the drug or with the residue which is in contact with the drug

    ModuLand plug-in for Cytoscape: determination of hierarchical layers of overlapping network modules and community centrality

    Get PDF
    Summary: The ModuLand plug-in provides Cytoscape users an algorithm for determining extensively overlapping network modules. Moreover, it identifies several hierarchical layers of modules, where meta-nodes of the higher hierarchical layer represent modules of the lower layer. The tool assigns module cores, which predict the function of the whole module, and determines key nodes bridging two or multiple modules. The plug-in has a detailed JAVA-based graphical interface with various colouring options. The ModuLand tool can run on Windows, Linux, or Mac OS. We demonstrate its use on protein structure and metabolic networks. Availability: The plug-in and its user guide can be downloaded freely from: http://www.linkgroup.hu/modules.php. Contact: [email protected] Supplementary information: Supplementary information is available at Bioinformatics online.Comment: 39 pages, 1 figure and a Supplement with 9 figures and 10 table

    Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species

    Get PDF
    Antigenic variation enables pathogens to avoid the host immune response by continual switching of surface proteins. The protozoan blood parasite Trypanosoma brucei causes human African trypanosomiasis ("sleeping sickness") across sub-Saharan Africa and is a model system for antigenic variation, surviving by periodically replacing a monolayer of variant surface glycoproteins (VSG) that covers its cell surface. We compared the genome of Trypanosoma brucei with two closely related parasites Trypanosoma congolense and Trypanosoma vivax, to reveal how the variant antigen repertoire has evolved and how it might affect contemporary antigenic diversity. We reconstruct VSG diversification showing that Trypanosoma congolense uses variant antigens derived from multiple ancestral VSG lineages, whereas in Trypanosoma brucei VSG have recent origins, and ancestral gene lineages have been repeatedly co-opted to novel functions. These historical differences are reflected in fundamental differences between species in the scale and mechanism of recombination. Using phylogenetic incompatibility as a metric for genetic exchange, we show that the frequency of recombination is comparable between Trypanosoma congolense and Trypanosoma brucei but is much lower in Trypanosoma vivax. Furthermore, in showing that the C-terminal domain of Trypanosoma brucei VSG plays a crucial role in facilitating exchange, we reveal substantial species differences in the mechanism of VSG diversification. Our results demonstrate how past VSG evolution indirectly determines the ability of contemporary parasites to generate novel variant antigens through recombination and suggest that the current model for antigenic variation in Trypanosoma brucei is only one means by which these parasites maintain chronic infections

    High-Performance Meta-Genomic Gene Identification

    Get PDF
    Computational Genomics, or Computational Genetics, refers to the use of computational and statistical analysis for understanding the structure and the function of genetic material in organisms. The primary focus of research in computational genomics in the past three decades has been the understanding of genomes and their functional elements by analyzing biological sequence data. The high demand for low-cost sequencing has driven the development of highthroughput sequencing technologies, next-generation sequencing (NGS), that parallelize the sequencing process, producing thousands or millions of sequences concurrently. Moore’s Law is the observation that the number of transistors on integrated circuits doubles approximately every two years; correspondingly, the cost per transistor halves. The cost of DNA sequencing declines much faster, which implies more new DNA data will be obtained. This large-scale sequence data, produced with high throughput sequencing technologies, needs to be processed in a time-effective and cost-effective manner. In this dissertation, we present a high-performance meta-genome gene identification framework. This framework includes four modules: filter, alignment, error correction, and gene identification. The following chapters describe the proposed design and evaluation of this pipeline. The most computationally expensive kernel in the framework is the alignment procedure. Thus, the filter module is developed to determine unnecessary alignment operations. Without the filter module, the alignment module requires 1.9 hours to complete all-to-all alignment on a test file of size 512,000 sequences with each sequence average length 750 base pairs by using ten Kepler K20 NVIDIA GPU. On the other hand, when combined with the filter kernel, the total time is 11.3 minutes. Note that the ideal speedup is nearly 91.4 times faster when new alignment kernel is run on ten GPUs ( 10*9.14). We conclude that accuracy can be achieved at the expense of more resources while operating frequency can still be maintained

    Biosynthesis and expression of zona pellucida glycoproteins in mammals

    Get PDF
    The zona pellucida (ZP) is an extracellular matrix surrounding the oocyte and the early embryo that exerts several important functions during fertilization and early embryonic development. The ZP of most mammalian species is composed of three glycoproteins (ZPA, ZPB, ZPC), products of the gene families ZPA, ZPB and ZPC that have been found to be highly homologous within mammalian species. Most data on the structure and function of the ZP are obtained from studies in mouse. New data from pig and other domestic animals, however, indicate that the mouse model does not hold for all other species. Whereas in the mouse ZPB is the primary sperm receptor, in the pig ZPA has been shown to possess receptor activity. Contrary to the mouse, where the growing oocyte is the only source of zona glycoproteins, in domestic animals these proteins are expressed in both the oocyte and granulosa cells in a stage-specific pattern and may play also a role in granulosa cell differentiation. In several mammalian species, the epithelial secretory cells of the oviduct synthesize and secrete specific glycoproteins (oviductins) that become closely associated with the ZP of the ovulated oocyte. Once bound to the ZP, oviductin molecules could act as a protective layer around the oocyte and early embryo by virtue of their densely glycosylated mucin-type domains. Copyright (C) 2001 S. Karger AG, Basel

    Topological network alignment uncovers biological function and phylogeny

    Full text link
    Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde

    The expression of the gene for Azurin from Alcaligenes denitrificans in E. coli : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Biochemistry at Massey University

    Get PDF
    Azurin is a protein which funtions in electron transport and has been found to bind copper when it is expressed in its native bacterial host. In this thesis the azurin from Alcaligenes denitrificans was used. This protein is 129 amino acids long with a molecular weight of 14. 600 dalions. The azurin coding gene from Alcaligenes denitrificans had previously been cloned into a plasmid which allows an E. coli expression system to be used. Azurin was purified from the E. coli hosts using the same procedures as for purifying copper-azurin from the native hosts but was found to remain apparently impure, according to spectrophotomctric data. Efforts to increase the production of the protein by using different expression systems and by refining the existing expression system failed to increase the apparent yield of copper-azurin. Efforts to refine the purification procedure also failed to increase the amount of copper-azurin that was purified. Various experiments were performed to demonstrate that azurin was expressed and processed correctly in the E. coli host. Protein was expressed in a copper-rich and copper-sparse environment. Copper-azurin was purified from the copper-rich environment, while very little copper-azurin could be extracted from the copper-sparse environment. The results described in this thesis suggest that when azurin from A. denitrificans is expressed in an E. coli host using standard media with no copper added, the predominant form of azurin produced in zinc-azunn. As mutants are going to made of this protein, conditions where the protein would bind only copper were required. The ideal conditions for this are still to be calculated but results from this thesis would suggest that copper concentrations in the region of 0.25 mM lead to 65% incorporation of copper. compared to 17% when no copper is added to the E. coli growth medium. E. coli cells were shown to grow with no apparent inhibition of growth in 3.0 mM of CuSO4. This concentration of copper in the growth medium may allow the production of a much higher ratio of copper-azurin compared to zinc-azurin than has been achieved so far

    Machine Learning and Graph Theory Approaches for Classification and Prediction of Protein Structure

    Get PDF
    Recently, many methods have been proposed for the classification and prediction problems in bioinformatics. One of these problems is the protein structure prediction. Machine learning approaches and new algorithms have been proposed to solve this problem. Among the machine learning approaches, Support Vector Machines (SVM) have attracted a lot of attention due to their high prediction accuracy. Since protein data consists of sequence and structural information, another most widely used approach for modeling this structured data is to use graphs. In computer science, graph theory has been widely studied; however it has only been recently applied to bioinformatics. In this work, we introduced new algorithms based on statistical methods, graph theory concepts and machine learning for the protein structure prediction problem. A new statistical method based on z-scores has been introduced for seed selection in proteins. A new method based on finding common cliques in protein data for feature selection is also introduced, which reduces noise in the data. We also introduced new binary classifiers for the prediction of structural transitions in proteins. These new binary classifiers achieve much higher accuracy results than the current traditional binary classifiers

    High-Throughput 3D Homology Detection via NMR Resonance Assignment

    Get PDF
    One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequences can have similar folds --- how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure, called HD, for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies 3D models in a protein structural database whose geometries best fit the unassigned experimental NMR data. HD does not use, and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or homology modelling. The algorithm runs in O(pn5/2log(cn)+plogp)O(pn^{5/2} \log {(cn)} + p \log p) time, where pp is the number of proteins in the database, nn is the number of residues in the target protein and cc is the maximum edge weight in an integer-weighted bipartite graph. Our experiments on real NMR data from 3 different proteins against a database of 4,500 representative folds demonstrate that the method identifies closely related protein folds, including sub-domains of larger proteins, with as little as 10-30\% sequence homology between the target protein (or sub-domain) and the computed model. In particular, we report no false-negatives or false-positives despite significant percentages of missing experimental data
    corecore