182 research outputs found

    Estimating selection pressures on HIV-1 using phylogenetic likelihood models

    Get PDF
    Human immunodeficiency virus (HIV-1) can rapidly evolve due to selection pressures exerted by HIV-specific immune responses, antiviral agents, and to allow the virus to establish infection in different compartments in the body. Statistical models applied to HIV-1 sequence data can help to elucidate the nature of these selection pressures through comparisons of non-synonymous (or amino acid changing) and synonymous (or amino acid preserving) substitution rates. These models also need to take into account the non-independence of sequences due to their shared evolutionary history. We review how we have developed these methods and have applied them to characterize the evolution of HIV-1 in vivo.To illustrate our methods, we present an analysis of compartment-specific evolution of HIV-1 env in blood and cerebrospinal fluid and of site-to-site variation in the gag gene of subtype C HIV-1

    Modular Algorithms for Biomolecular Network Alignment

    Get PDF
    Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. The rapidly advancing field of systems biology aims to understand the structure, function, dynamics, and evolution of complex biological systems in terms of the underlying networks of interactions among the large number of molecular participants involved including genes, proteins, and metabolites. In particular, the comparative analysis of network models representing biomolecular interactions in different species or tissues offers an important tool for identifying conserved modules, predicting functions of specific genes or proteins and studying the evolution of biological processes, among other applications. The primary focus of this dissertation is on the biomolecular network alignment problem: Given two or more network models, the problem is to optimally match the nodes and links in one network with the nodes and links of the other. The Biomolecular Network Alignment (BiNA) Toolkit developed as part of this dissertation provides a set of efficient (in terms of the running time complexity) and accurate (in terms of various evaluation criteria discussed in the literature) network alignment algorithms for biomolecular networks. BiNA is scalable, user-friendly, modular, and extensible for performing alignments on diverse types of biomolecular networks. The algorithm is applicable to (1) undirected graphs in their weighted and unweighted variations (2) undirected graphs in their labeled and unlabeled variations (3) and has been applied to align multiple networks from hundreds of nodes with a few thousand edges to networks with tens of thousands of nodes with millions of edges. The dissertation provides various applications of network comparison tools including how results from such alignments have been utilized to (1) construct phylogenetic trees based on protein-protein interaction networks, and (2) find biochemical pathways involved in ligand recognition in B cells

    Computational Biology and Chemistry

    Get PDF
    The use of computers and software tools in biochemistry (biology) has led to a deep revolution in basic sciences and medicine. Bioinformatics and systems biology are the direct results of this revolution. With the involvement of computers, software tools, and internet services in scientific disciplines comprising biology and chemistry, new terms, technologies, and methodologies appeared and established. Bioinformatic software tools, versatile databases, and easy internet access resulted in the occurrence of computational biology and chemistry. Today, we have new types of surveys and laboratories including “in silico studies” and “dry labs” in which bioinformaticians conduct their investigations to gain invaluable outcomes. These features have led to 3-dimensioned illustrations of different molecules and complexes to get a better understanding of nature

    Chemistry-informed Macromolecule Graph Representation for Similarity Computation and Supervised Learning

    Full text link
    Macromolecules are large, complex molecules composed of covalently bonded monomer units, existing in different stereochemical configurations and topologies. As a result of such chemical diversity, representing, comparing, and learning over macromolecules emerge as critical challenges. To address this, we developed a macromolecule graph representation, with monomers and bonds as nodes and edges, respectively. We captured the inherent chemistry of the macromolecule by using molecular fingerprints for node and edge attributes. For the first time, we demonstrated computation of chemical similarity between 2 macromolecules of varying chemistry and topology, using exact graph edit distances and graph kernels. We also trained graph neural networks for a variety of glycan classification tasks, achieving state-of-the-art results. Our work has two-fold implications - it provides a general framework for representation, comparison, and learning of macromolecules; and enables quantitative chemistry-informed decision-making and iterative design in the macromolecular chemical space.Comment: Main text: 4 pages, 2 figures, 1 table; Appendix: 18 pages, 25 figures, 3 table

    Epitope Mapping and Topographic Analysis of VAR2CSA DBL3X Involved in P. falciparum Placental Sequestration

    Get PDF
    Pregnancy-associated malaria is a major health problem, which mainly affects primigravidae living in malaria endemic areas. The syndrome is precipitated by accumulation of infected erythrocytes in placental tissue through an interaction between chondroitin sulphate A on syncytiotrophoblasts and a parasite-encoded protein on the surface of infected erythrocytes, believed to be VAR2CSA. VAR2CSA is a polymorphic protein of approximately 3,000 amino acids forming six Duffy-binding-like (DBL) domains. For vaccine development it is important to define the antigenic targets for protective antibodies and to characterize the consequences of sequence variation. In this study, we used a combination of in silico tools, peptide arrays, and structural modeling to show that sequence variation mainly occurs in regions under strong diversifying selection, predicted to form flexible loops. These regions are the main targets of naturally acquired immunoglobulin gamma and accessible for antibodies reacting with native VAR2CSA on infected erythrocytes. Interestingly, surface reactive anti-VAR2CSA antibodies also target a conserved DBL3X region predicted to form an α-helix. Finally, we could identify DBL3X sequence motifs that were more likely to occur in parasites isolated from primi- and multigravidae, respectively. These findings strengthen the vaccine candidacy of VAR2CSA and will be important for choosing epitopes and variants of DBL3X to be included in a vaccine protecting women against pregnancy-associated malaria

    CLUSS: Clustering of protein sequences based on a new similarity measure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "<it>phylogenetic</it>" in the sense of "<it>relatedness of biological functions</it>".</p> <p>Results</p> <p>To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity.</p> <p>Conclusion</p> <p>We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.</p

    Development of a multiple glycan alignment method and database to represent the ambiguity of glycan recognition mechanisms

    Get PDF
    糖鎖は細胞表面上に多く存在し、糖鎖とレクチンなどの糖鎖結合タンパク質(GBP)が相互作用することにより様々な生体機能に関与している。病原性微生物の感染にもGBPによる糖鎖認識結合の関与が知られている。糖鎖認識結合パターンを解析するために糖鎖アレイやレクチンフロンタルアフィニティークロマトグラフィーなどGBP相互作用実験が行われ、実験データがウェブ上で公開されている。しかし、GBPが認識する糖鎖構造はファミリー間でも多様性を示し、特にレクチンと糖鎖の相互作用は多価の結合活性があり個々の結合力は低い傾向にあるため、GBPの糖鎖認識結合パターンが曖昧である。そのため実験データの解釈は、結合の曖昧さに起因して離散的なGBP結合パターンを同定することを困難にする。 バイオインフォマティクスでは、これらの種類の実験データを同定しようと一般的に見出される所定の糖鎖モチーフを用いて行われてきた。本研究ではあらかじめ定義されたモチーフを使用するのではなく、実験データによるGBPによって認識されている糖鎖の状態を視覚化するためのMultiple Carbohydrate Alignment With Weights(MCAW)ツールを開発した。そして、MCAWツールを用いて公開されている実験データの解析を行いGBPの糖鎖認識結合パターンのプロファイルを得ることができた。まだ特定されていないGBPの糖鎖認識プロファイルを推測できる可能性も示すことができた。このようなバイオインフォマティクスのツールの利用は生物学者がツールを理解し、分析するには手間がかかるため、CFGで公開されている糖鎖アレイ1000を越える実験データをMCAWツールを用いて分析し、MCAW-DBとして公開した。創価大
    corecore