12,229 research outputs found
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Mutation of CD2AP and SH3KBP1 binding motif in alphavirus nsP3 hypervariable domain results in attenuated virus
Infection by Chikungunya virus (CHIKV) of the Old World alphaviruses (family Togaviridae) in humans can cause arthritis and arthralgia. The virus encodes four non-structural proteins (nsP) (nsP1, nsp2, nsP3 and nsP4) that act as subunits of the virus replicase. These proteins also interact with numerous host proteins and some crucial interactions are mediated by the unstructured C-terminal hypervariable domain (HVD) of nsP3. In this study, a human cell line expressing EGFP tagged with CHIKV nsP3 HVD was established. Using quantitative proteomics, it was found that CHIKV nsP3 HVD can bind cytoskeletal proteins, including CD2AP, SH3KBP1, CAPZA1, CAPZA2 and CAPZB. The interaction with CD2AP was found to be most evident; its binding site was mapped to the second SH3 ligand-like element in nsP3 HVD. Further assessment indicated that CD2AP can bind to nsP3 HVDs of many different New and Old World alphaviruses. Mutation of the short binding element hampered the ability of the virus to establish infection. The mutation also abolished ability of CD2AP to co-localise with nsP3 and replication complexes of CHIKV; the same was observed for Semliki Forest virus (SFV) harbouring a similar mutation. Similar to CD2AP, its homolog SH3KBP1 also bound the identified motif in CHIKV and SFV nsP3
Optimized Null Model for Protein Structure Networks
Much attention has recently been given to the statistical significance of topological features observed in biological networks. Here, we consider residue interaction graphs (RIGs) as network representations of protein structures with residues as nodes and inter-residue interactions as edges. Degree-preserving randomized models have been widely used for this purpose in biomolecular networks. However, such a single summary statistic of a network may not be detailed enough to capture the complex topological characteristics of protein structures and their network counterparts. Here, we investigate a variety of topological properties of RIGs to find a well fitting network null model for them. The RIGs are derived from a structurally diverse protein data set at various distance cut-offs and for different groups of interacting atoms. We compare the network structure of RIGs to several random graph models. We show that 3-dimensional geometric random graphs, that model spatial relationships between objects, provide the best fit to RIGs. We investigate the relationship between the strength of the fit and various protein structural features. We show that the fit depends on protein size, structural class, and thermostability, but not on quaternary structure. We apply our model to the identification of significantly over-represented structural building blocks, i.e., network motifs, in protein structure networks. As expected, choosing geometric graphs as a null model results in the most specific identification of motifs. Our geometric random graph model may facilitate further graph-based studies of protein conformation space and have important implications for protein structure comparison and prediction. The choice of a well-fitting null model is crucial for finding structural motifs that play an important role in protein folding, stability and function. To our knowledge, this is the first study that addresses the challenge of finding an optimized null model for RIGs, by comparing various RIG definitions against a series of network models
Homology modelling of transferrin-binding protein A from Neisseria meningitidis
Neisseria meningitidis, a causative agent of bacterial
meningitis, obtains transferrin-bound iron by expressing
two outer membrane located transferrin-binding proteins,
TbpA and TbpB. TbpA is thought to be an integral outer
membrane pore that facilitates iron uptake. Evidence suggests
that TbpA is a useful antigen for inclusion in a vaccine
effective against meningococcal disease, hence the identification
of regions involved in ligand binding is of paramount
importance to design strategies to block uptake of iron. The
protein shares sequence and functional similarities to the
Escherichia coli siderophore receptors FepA and FhuA,
whose structures have been determined. These receptors
are composed of two domains, a 22-stranded b-barrel and
an N-terminal plug region that sits within the barrel and
occludes the transmembrane pore. A three-dimensional
TbpA model was constructed using FepA and FhuA structural
templates, hydrophobicity analysis and homology
modelling. TbpA was found to possess a similar architecture
to the siderophore receptors. In addition to providing
insights into the highly immunogenic nature of TbpA and
allowing the prediction of potentially important ligandbinding
epitopes, the model also reveals a narrow channel
through its entire length. The relevance of this channel and
the spatial arrangement of external loops, to the mechanism
of iron translocation employed by TbpA is discussed
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A structural study for the optimisation of functional motifs encoded in protein sequences
BACKGROUND: A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure. RESULTS: Here we present a new procedure aimed at improving the sensitivity and/ or specificity of poorly-performing patterns. The procedure can be summarised as follows: 1. residues structurally conserved in different proteins, that are true positives for a pattern, are identified by means of a computational technique and by visual inspection. 2. the sequence positions of the structurally conserved residues falling outside the pattern are used to build extended sequence patterns. 3. the extended patterns are optimised on the SWISS-PROT database for their sensitivity and specificity. The method was applied to eight PROSITE patterns. Whenever structurally conserved residues are found in the surface region close to the pattern (seven out of eight cases), the addition of information inferred from structural analysis is shown to improve pattern selectivity and in some cases selectivity and sensitivity as well. In some of the cases considered the procedure allowed the identification of functionally interesting residues, whose biological role is also discussed. CONCLUSION: Our method can be applied to any type of functional motif or pattern (not only PROSITE ones) which is not able to select all and only the true positive hits and for which at least two true positive structures are available. The computational technique for the identification of structurally conserved residues is already available on request and will be soon accessible on our web server. The procedure is intended for the use of pattern database curators and of scientists interested in a specific protein family for which no specific or selective patterns are yet available
A peptidoglycan hydrolase motif within the mycobacteriophage TM4 tape measure protein promotes efficient infection of stationary phase cells
The predominant morphotype of mycobacteriophage virions has a DNA-containing capsid attached to a long flexible non-contractile tail, features characteristic of the Siphoviridae. Within these phage genomes the tape measure protein (tmp) gene can be readily identified due to the well-established relationship between the length of the gene and the length of the phage tail - because these phages typically have long tails, the tmp gene is usually the largest gene in the genome. Many of these mycobacteriophage Tmp's contain small motifs with sequence similarity to host proteins. One of these motifs (motif 1) corresponds to the Rpf proteins that have lysozyme activity and function to stimulate growth of dormant bacteria, while the others (motifs 2 and 3) are related to proteins of unknown function, although some of the related proteins of the host are predicted to be involved in cell wall catabolism. We show here that motif 3-containing proteins have peptidoglycan-hydrolysing activity and that while this activity is not required for phage viability, it facilitates efficient infection and DNA injection into stationary phase cells. Tmp's of mycobacteriophages may thus have acquired these motifs in order to avoid a selective disadvantage that results from changes in peptidoglycan in non-growing cells. © 2006 The Authors
- …