96 research outputs found
Prediction of Co-Receptor Usage of HIV-1 from Genotype
Human Immunodeficiency Virus 1 uses for entry into host cells a receptor (CD4) and one of two co-receptors (CCR5 or CXCR4). Recently, a new class of antiretroviral drugs has entered clinical practice that specifically bind to the co-receptor CCR5, and thus inhibit virus entry. Accurate prediction of the co-receptor used by the virus in the patient is important as it allows for personalized selection of effective drugs and prognosis of disease progression. We have investigated whether it is possible to predict co-receptor usage accurately by analyzing the amino acid sequence of the main determinant of co-receptor usage, i.e., the third variable loop V3 of the gp120 protein. We developed a two-level machine learning approach that in the first level considers two different properties important for protein-protein binding derived from structural models of V3 and V3 sequences. The second level combines the two predictions of the first level. The two-level method predicts usage of CXCR4 co-receptor for new V3 sequences within seconds, with an area under the ROC curve of 0.937±0.004. Moreover, it is relatively robust against insertions and deletions, which frequently occur in V3. The approach could help clinicians to find optimal personalized treatments, and it offers new insights into the molecular basis of co-receptor usage. For instance, it quantifies the importance for co-receptor usage of a pocket that probably is responsible for binding sulfated tyrosine
Recommended from our members
Genetic diversity at the Dhn3 locus in Turkish Hordeum spontaneum populations with comparative structural analyses
We analysed Hordeum spontaneum accessions from 21 different locations to understand the genetic diversity of HsDhn3 alleles and effects of single base mutations on the intrinsically disordered structure of the resulting polypeptide (HsDHN3). HsDHN3 was found to be YSK2-type with a low-frequency 6-aa deletion in the beginning of Exon 1. There is relatively high diversity in the intron region of HsDhn3 compared to the two exon regions. We have found subtle differences in K segments led to changes in amino acids chemical properties. Predictions for protein interaction profiles suggest the presence of a protein-binding site in HsDHN3 that coincides with the K1 segment. Comparison of DHN3 to closely related cereals showed that all of them contain a nuclear localization signal sequence flanking to the K1 segment and a novel conserved region located between the S and K1 segments [E(D/T)DGMGGR]. We found that H. vulgare, H. spontaneum, and Triticum urartu DHN3s have a greater number of phosphorylation sites for protein kinase C than other cereal species, which may be related to stress adaptation. Our results show that the nature and extent of mutations in the conserved segments of K1 and K2 are likely to be key factors in protection of cells
Selection in Coastal Synechococcus (Cyanobacteria) Populations Evaluated from Environmental Metagenomes
Environmental metagenomics provides snippets of genomic sequences from all organisms in an environmental sample and are an unprecedented resource of information for investigating microbial population genetics. Current analytical methods, however, are poorly equipped to handle metagenomic data, particularly of short, unlinked sequences. A custom analytical pipeline was developed to calculate dN/dS ratios, a common metric to evaluate the role of selection in the evolution of a gene, from environmental metagenomes sequenced using 454 technology of flow-sorted populations of marine Synechococcus, the dominant cyanobacteria in coastal environments. The large majority of genes (98%) have evolved under purifying selection (dN/dS<1). The metagenome sequence coverage of the reference genomes was not uniform and genes that were highly represented in the environment (i.e. high read coverage) tended to be more evolutionarily conserved. Of the genes that may have evolved under positive selection (dN/dS>1), 77 out of 83 (93%) were hypothetical. Notable among annotated genes, ribosomal protein L35 appears to be under positive selection in one Synechococcus population. Other annotated genes, in particular a possible porin, a large-conductance mechanosensitive channel, an ATP binding component of an ABC transporter, and a homologue of a pilus retraction protein had regions of the gene with elevated dN/dS. With the increasing use of next-generation sequencing in metagenomic investigations of microbial diversity and ecology, analytical methods need to accommodate the peculiarities of these data streams. By developing a means to analyze population diversity data from these environmental metagenomes, we have provided the first insight into the role of selection in the evolution of Synechococcus, a globally significant primary producer
Comparison of Internal Ribosome Entry Site (IRES) and Furin-2A (F2A) for Monoclonal Antibody Expression Level and Quality in CHO Cells
10.1371/journal.pone.0063247PLoS ONE85
FungalRV: adhesin prediction and immunoinformatics portal for human fungal pathogens
<p>Abstract</p> <p>Background</p> <p>The availability of sequence data of human pathogenic fungi generates opportunities to develop Bioinformatics tools and resources for vaccine development towards benefitting at-risk patients.</p> <p>Description</p> <p>We have developed a fungal adhesin predictor and an immunoinformatics database with predicted adhesins. Based on literature search and domain analysis, we prepared a positive dataset comprising adhesin protein sequences from human fungal pathogens <it>Candida albicans, Candida glabrata, Aspergillus fumigatus, Coccidioides immitis, Coccidioides posadasii, Histoplasma capsulatum, Blastomyces dermatitidis, Pneumocystis carinii, Pneumocystis jirovecii and Paracoccidioides brasiliensis</it>. The negative dataset consisted of proteins with high probability to function intracellularly. We have used 3945 compositional properties including frequencies of mono, doublet, triplet, and multiplets of amino acids and hydrophobic properties as input features of protein sequences to Support Vector Machine. Best classifiers were identified through an exhaustive search of 588 parameters and meeting the criteria of best Mathews Correlation Coefficient and lowest coefficient of variation among the 3 fold cross validation datasets. The "FungalRV adhesin predictor" was built on three models whose average Mathews Correlation Coefficient was in the range 0.89-0.90 and its coefficient of variation across three fold cross validation datasets in the range 1.2% - 2.74% at threshold score of 0. We obtained an overall MCC value of 0.8702 considering all 8 pathogens, namely, <it>C. albicans, C. glabrata, A. fumigatus, B. dermatitidis, C. immitis, C. posadasii, H. capsulatum </it>and <it>P. brasiliensis </it>thus showing high sensitivity and specificity at a threshold of 0.511. In case of <it>P. brasiliensis </it>the algorithm achieved a sensitivity of 66.67%. A total of 307 fungal adhesins and adhesin like proteins were predicted from the entire proteomes of eight human pathogenic fungal species. The immunoinformatics analysis data on these proteins were organized for easy user interface analysis. A Web interface was developed for analysis by users. The predicted adhesin sequences were processed through 18 immunoinformatics algorithms and these data have been organized into MySQL backend. A user friendly interface has been developed for experimental researchers for retrieving information from the database.</p> <p>Conclusion</p> <p>FungalRV webserver facilitating the discovery process for novel human pathogenic fungal adhesin vaccine has been developed.</p
Detecting Clusters of Mutations
Positive selection for protein function can lead to multiple mutations within a small stretch of DNA, i.e., to a cluster of mutations. Recently, Wagner proposed a method to detect such mutation clusters. His method, however, did not take into account that residues with high solvent accessibility are inherently more variable than residues with low solvent accessibility. Here, we propose a new algorithm to detect clustered evolution. Our algorithm controls for different substitution probabilities at buried and exposed sites in the tertiary protein structure, and uses random permutations to calculate accurate P values for inferred clusters. We apply the algorithm to genomes of bacteria, fly, and mammals, and find several clusters of mutations in functionally important regions of proteins. Surprisingly, clustered evolution is a relatively rare phenomenon. Only between 2% and 10% of the genes we analyze contain a statistically significant mutation cluster. We also find that not controlling for solvent accessibility leads to an excess of clusters in terminal and solvent-exposed regions of proteins. Our algorithm provides a novel method to identify functionally relevant divergence between groups of species. Moreover, it could also be useful to detect artifacts in automatically assembled genomes
Archaic chaos: intrinsically disordered proteins in Archaea
Background: Many proteins or their regions known as intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) lack unique 3D structure in their native states under physiological conditions yet fulfill key biological functions. Earlier bioinformatics studies showed that IDPs and IDRs are highly abundant in different proteomes and carry out mostly regulatory functions related to molecular recognition and signal transduction. Archaea belong to an intriguing domain of life whose members, being microbes, are characterized by a unique mosaic-like combination of bacterial and eukaryotic properties and include inhabitants of some of the most extreme environments on the planet. With the expansion of the archaea genome data (more than fifty archaea species from five different phyla are known now), and with recent improvements in the accuracy of intrinsic disorder prediction, it is time to re-examine the abundance of IDPs and IDRs in the archaea domain.Results: The abundance of IDPs and IDRs in 53 archaea species is analyzed. The amino acid composition profiles of these species are generally quite different from each other. The disordered content is highly species-dependent. Thermoproteales proteomes have 14% of disordered residues, while in Halobacteria, this value increases to 34%. In proteomes of these two phyla, proteins containing long disordered regions account for 12% and 46%, whereas 4% and 26% their proteins are wholly disordered. These three measures of disorder content are linearly correlated with each other at the genome level. There is a weak correlation between the environmental factors (such as salinity, pH and temperature of the habitats) and the abundance of intrinsic disorder in Archaea, with various environmental factors possessing different disorder-promoting strengths. Harsh environmental conditions, especially those combining several hostile factors, clearly favor increased disorder content. Intrinsic disorder is highly abundant in functional Pfam domains of the archaea origin. The analysis based on the disordered content and phylogenetic tree indicated diverse evolution of intrinsic disorder among various classes and species of Archaea.Conclusions: Archaea proteins are rich in intrinsic disorder. Some of these IDPs and IDRs likely evolve to help archaea to accommodate to their hostile habitats. Other archaean IDPs and IDRs possess crucial biological functions similar to those of the bacterial and eukaryotic IDPs/IDRs
Evidence for a Fourteenth mtDNA-Encoded Protein in the Female-Transmitted mtDNA of Marine Mussels (Bivalvia: Mytilidae)
BACKGROUND: A novel feature for animal mitochondrial genomes has been recently established: i.e., the presence of additional, lineage-specific, mtDNA-encoded proteins with functional significance. This feature has been observed in freshwater mussels with doubly uniparental inheritance of mtDNA (DUI). The latter unique system of mtDNA transmission, which also exists in some marine mussels and marine clams, is characterized by one mt genome inherited from the female parent (F mtDNA) and one mt genome inherited from the male parent (M mtDNA). In freshwater mussels, the novel mtDNA-encoded proteins have been shown to be mt genome-specific (i.e., one novel protein for F genomes and one novel protein for M genomes). It has been hypothesized that these novel, F- and M-specific, mtDNA-encoded proteins (and/or other F- and/or M-specific mtDNA sequences) could be responsible for the different modes of mtDNA transmission in bivalves but this remains to be demonstrated. METHODOLOGY/PRINCIPAL FINDINGS: We investigated all complete (or nearly complete) female- and male-transmitted marine mussel mtDNAs previously sequenced for the presence of ORFs that could have functional importance in these bivalves. Our results confirm the presence of a novel F genome-specific mt ORF, of significant length (>100aa) and located in the control region, that most likely has functional significance in marine mussels. The identification of this ORF in five Mytilus species suggests that it has been maintained in the mytilid lineage (subfamily Mytilinae) for ∼13 million years. Furthermore, this ORF likely has a homologue in the F mt genome of Musculista senhousia, a DUI-containing mytilid species in the subfamily Crenellinae. We present evidence supporting the functionality of this F-specific ORF at the transcriptional, amino acid and nucleotide levels. CONCLUSIONS/SIGNIFICANCE: Our results offer support for the hypothesis that "novel F genome-specific mitochondrial genes" are involved in key biological functions in bivalve species with DUI
The Effect of Iron Limitation on the Transcriptome and Proteome of Pseudomonas fluorescens Pf-5
One of the most important micronutrients for bacterial growth is iron, whose bioavailability in soil is limited. Consequently, rhizospheric bacteria such as Pseudomonas fluorescens employ a range of mechanisms to acquire or compete for iron. We investigated the transcriptomic and proteomic effects of iron limitation on P. fluorescens Pf-5 by employing microarray and iTRAQ techniques, respectively. Analysis of this data revealed that genes encoding functions related to iron homeostasis, including pyoverdine and enantio-pyochelin biosynthesis, a number of TonB-dependent receptor systems, as well as some inner-membrane transporters, were significantly up-regulated in response to iron limitation. Transcription of a ribosomal protein L36-encoding gene was also highly up-regulated during iron limitation. Certain genes or proteins involved in biosynthesis of secondary metabolites such as 2,4-diacetylphloroglucinol (DAPG), orfamide A and pyrrolnitrin, as well as a chitinase, were over-expressed under iron-limited conditions. In contrast, we observed that expression of genes involved in hydrogen cyanide production and flagellar biosynthesis were down-regulated in an iron-depleted culture medium. Phenotypic tests revealed that Pf-5 had reduced swarming motility on semi-solid agar in response to iron limitation. Comparison of the transcriptomic data with the proteomic data suggested that iron acquisition is regulated at both the transcriptional and post-transcriptional levels
- …