12,229 research outputs found

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Mutation of CD2AP and SH3KBP1 binding motif in alphavirus nsP3 hypervariable domain results in attenuated virus

    Get PDF
    Infection by Chikungunya virus (CHIKV) of the Old World alphaviruses (family Togaviridae) in humans can cause arthritis and arthralgia. The virus encodes four non-structural proteins (nsP) (nsP1, nsp2, nsP3 and nsP4) that act as subunits of the virus replicase. These proteins also interact with numerous host proteins and some crucial interactions are mediated by the unstructured C-terminal hypervariable domain (HVD) of nsP3. In this study, a human cell line expressing EGFP tagged with CHIKV nsP3 HVD was established. Using quantitative proteomics, it was found that CHIKV nsP3 HVD can bind cytoskeletal proteins, including CD2AP, SH3KBP1, CAPZA1, CAPZA2 and CAPZB. The interaction with CD2AP was found to be most evident; its binding site was mapped to the second SH3 ligand-like element in nsP3 HVD. Further assessment indicated that CD2AP can bind to nsP3 HVDs of many different New and Old World alphaviruses. Mutation of the short binding element hampered the ability of the virus to establish infection. The mutation also abolished ability of CD2AP to co-localise with nsP3 and replication complexes of CHIKV; the same was observed for Semliki Forest virus (SFV) harbouring a similar mutation. Similar to CD2AP, its homolog SH3KBP1 also bound the identified motif in CHIKV and SFV nsP3

    Optimized Null Model for Protein Structure Networks

    Get PDF
    Much attention has recently been given to the statistical significance of topological features observed in biological networks. Here, we consider residue interaction graphs (RIGs) as network representations of protein structures with residues as nodes and inter-residue interactions as edges. Degree-preserving randomized models have been widely used for this purpose in biomolecular networks. However, such a single summary statistic of a network may not be detailed enough to capture the complex topological characteristics of protein structures and their network counterparts. Here, we investigate a variety of topological properties of RIGs to find a well fitting network null model for them. The RIGs are derived from a structurally diverse protein data set at various distance cut-offs and for different groups of interacting atoms. We compare the network structure of RIGs to several random graph models. We show that 3-dimensional geometric random graphs, that model spatial relationships between objects, provide the best fit to RIGs. We investigate the relationship between the strength of the fit and various protein structural features. We show that the fit depends on protein size, structural class, and thermostability, but not on quaternary structure. We apply our model to the identification of significantly over-represented structural building blocks, i.e., network motifs, in protein structure networks. As expected, choosing geometric graphs as a null model results in the most specific identification of motifs. Our geometric random graph model may facilitate further graph-based studies of protein conformation space and have important implications for protein structure comparison and prediction. The choice of a well-fitting null model is crucial for finding structural motifs that play an important role in protein folding, stability and function. To our knowledge, this is the first study that addresses the challenge of finding an optimized null model for RIGs, by comparing various RIG definitions against a series of network models

    Homology modelling of transferrin-binding protein A from Neisseria meningitidis

    Get PDF
    Neisseria meningitidis, a causative agent of bacterial meningitis, obtains transferrin-bound iron by expressing two outer membrane located transferrin-binding proteins, TbpA and TbpB. TbpA is thought to be an integral outer membrane pore that facilitates iron uptake. Evidence suggests that TbpA is a useful antigen for inclusion in a vaccine effective against meningococcal disease, hence the identification of regions involved in ligand binding is of paramount importance to design strategies to block uptake of iron. The protein shares sequence and functional similarities to the Escherichia coli siderophore receptors FepA and FhuA, whose structures have been determined. These receptors are composed of two domains, a 22-stranded b-barrel and an N-terminal plug region that sits within the barrel and occludes the transmembrane pore. A three-dimensional TbpA model was constructed using FepA and FhuA structural templates, hydrophobicity analysis and homology modelling. TbpA was found to possess a similar architecture to the siderophore receptors. In addition to providing insights into the highly immunogenic nature of TbpA and allowing the prediction of potentially important ligandbinding epitopes, the model also reveals a narrow channel through its entire length. The relevance of this channel and the spatial arrangement of external loops, to the mechanism of iron translocation employed by TbpA is discussed

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A structural study for the optimisation of functional motifs encoded in protein sequences

    Get PDF
    BACKGROUND: A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. RESULTS: Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases), the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. CONCLUSION: Our method can be applied to any type of functional motif or pattern (not only PROSITE ones) which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of structurally conserved residues is already available on request and will be soon accessible on our web server. The procedure is intended for the use of pattern database curators and of scientists interested in a specific protein family for which no specific or selective patterns are yet available

    A peptidoglycan hydrolase motif within the mycobacteriophage TM4 tape measure protein promotes efficient infection of stationary phase cells

    Get PDF
    The predominant morphotype of mycobacteriophage virions has a DNA-containing capsid attached to a long flexible non-contractile tail, features characteristic of the Siphoviridae. Within these phage genomes the tape measure protein (tmp) gene can be readily identified due to the well-established relationship between the length of the gene and the length of the phage tail - because these phages typically have long tails, the tmp gene is usually the largest gene in the genome. Many of these mycobacteriophage Tmp's contain small motifs with sequence similarity to host proteins. One of these motifs (motif 1) corresponds to the Rpf proteins that have lysozyme activity and function to stimulate growth of dormant bacteria, while the others (motifs 2 and 3) are related to proteins of unknown function, although some of the related proteins of the host are predicted to be involved in cell wall catabolism. We show here that motif 3-containing proteins have peptidoglycan-hydrolysing activity and that while this activity is not required for phage viability, it facilitates efficient infection and DNA injection into stationary phase cells. Tmp's of mycobacteriophages may thus have acquired these motifs in order to avoid a selective disadvantage that results from changes in peptidoglycan in non-growing cells. © 2006 The Authors
    corecore