65 research outputs found
An Unsupervised Algorithm for Host Identification in Flaviviruses
Early characterization of emerging viruses is essential to control their spread, such as the Zika Virus outbreak in 2014. Among other non-viral factors, host information is essential for the surveillance and control of virus spread. Flaviviruses (genus Flavivirus), akin to other viruses, are modulated by high mutation rates and selective forces to adapt their codon usage to that of their hosts. However, a major challenge is the identification of potential hosts for novel viruses. Usually, potential hosts of emerging zoonotic viruses are identified after several confirmed cases. This is inefficient for deterring future outbreaks. In this paper, we introduce an algorithm to identify the host range of a virus from its raw genome sequences. The proposed strategy relies on comparing codon usage frequencies across viruses and hosts, by means of a normalized Codon Adaptation Index (CAI). We have tested our algorithm on 94 flaviviruses and 16 potential hosts. This novel method is able to distinguish between arthropod and vertebrate hosts for several flaviviruses with high values of accuracy (virus group 91.9% and host type 86.1%) and specificity (virus group 94.9% and host type 79.6%), in comparison to empirical observations. Overall, this algorithm may be useful as a complementary tool to current phylogenetic methods in monitoring current and future viral outbreaks by understanding hostâvirus relationships
E-CAI: a novel server to estimate an expected value of Codon Adaptation Index (eCAI)
<p>Abstract</p> <p>Background</p> <p>The Codon Adaptation Index (CAI) is a measure of the synonymous codon usage bias for a DNA or RNA sequence. It quantifies the similarity between the synonymous codon usage of a gene and the synonymous codon frequency of a reference set. Extreme values in the nucleotide or in the amino acid composition have a large impact on differential preference for synonymous codons. It is thence essential to define the limits for the expected value of CAI on the basis of sequence composition in order to properly interpret the CAI and provide statistical support to CAI analyses. Though several freely available programs calculate the CAI for a given DNA sequence, none of them corrects for compositional biases or provides confidence intervals for CAI values.</p> <p>Results</p> <p>The E-CAI server, available at <url>http://genomes.urv.es/CAIcal/E-CAI</url>, is a web-application that calculates an expected value of CAI for a set of query sequences by generating random sequences with G+C and amino acid content similar to those of the input. An executable file, a tutorial, a Frequently Asked Questions (FAQ) section and several examples are also available. To exemplify the use of the E-CAI server, we have analysed the codon adaptation of human mitochondrial genes that codify a subunit of the mitochondrial respiratory chain (excluding those genes that lack a prokaryotic orthologue) and are encoded in the nuclear genome. It is assumed that these genes were transferred from the proto-mitochondrial to the nuclear genome and that its codon usage was then ameliorated.</p> <p>Conclusion</p> <p>The E-CAI server provides a direct threshold value for discerning whether the differences in CAI are statistically significant or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences.</p
An Unsupervised Algorithm for Host Identification in Flaviviruses
Early characterization of emerging viruses is essential to control their spread, such as the Zika Virus outbreak in 2014. Among other non-viral factors, host information is essential for the surveillance and control of virus spread. Flaviviruses (genus Flavivirus), akin to other viruses, are modulated by high mutation rates and selective forces to adapt their codon usage to that of their hosts. However, a major challenge is the identification of potential hosts for novel viruses. Usually, potential hosts of emerging zoonotic viruses are identified after several confirmed cases. This is inefficient for deterring future outbreaks. In this paper, we introduce an algorithm to identify the host range of a virus from its raw genome sequences. The proposed strategy relies on comparing codon usage frequencies across viruses and hosts, by means of a normalized Codon Adaptation Index (CAI). We have tested our algorithm on 94 flaviviruses and 16 potential hosts. This novel method is able to distinguish between arthropod and vertebrate hosts for several flaviviruses with high values of accuracy (virus group 91.9% and host type 86.1%) and specificity (virus group 94.9% and host type 79.6%), in comparison to empirical observations. Overall, this algorithm may be useful as a complementary tool to current phylogenetic methods in monitoring current and future viral outbreaks by understanding hostâvirus relationships
PairWise Neighbours database: overlaps and spacers among prokaryote genomes
<p>Abstract</p> <p>Background</p> <p>Although prokaryotes live in a variety of habitats and possess different metabolic and genomic complexity, they have several genomic architectural features in common. The overlapping genes are a common feature of the prokaryote genomes. The overlapping lengths tend to be short because as the overlaps become longer they have more risk of deleterious mutations. The spacers between genes tend to be short too because of the tendency to reduce the non coding DNA among prokaryotes. However they must be long enough to maintain essential regulatory signals such as the Shine-Dalgarno (SD) sequence, which is responsible of an efficient translation.</p> <p>Description</p> <p>PairWise Neighbours is an interactive and intuitive database used for retrieving information about the spacers and overlapping genes among bacterial and archaeal genomes. It contains 1,956,294 gene pairs from 678 fully sequenced prokaryote genomes and is freely available at the URL <url>http://genomes.urv.cat/pwneigh</url>. This database provides information about the overlaps and their conservation across species. Furthermore, it allows the wide analysis of the intergenic regions providing useful information such as the location and strength of the SD sequence.</p> <p>Conclusion</p> <p>There are experiments and bioinformatic analysis that rely on correct annotations of the initiation site. Therefore, a database that studies the overlaps and spacers among prokaryotes appears to be desirable. PairWise Neighbours database permits the reliability analysis of the overlapping structures and the study of the SD presence and location among the adjacent genes, which may help to check the annotation of the initiation sites.</p
OPTIMIZER: a web server for optimizing the codon usage of DNA sequences
OPTIMIZER is an on-line application that optimizes the codon usage of a gene to increase its expression level. Three methods of optimization are available: the âone amino acidâone codonâ method, a guided random method based on a Monte Carlo algorithm, and a new method designed to maximize the optimization with the fewest changes in the query sequence. One of the main features of OPTIMIZER is that it makes it possible to optimize a DNA sequence using pre-computed codon usage tables from a predicted group of highly expressed genes from more than 150 prokaryotic species under strong translational selection. These groups of highly expressed genes have been predicted using a new iterative algorithm. In addition, users can use, as a reference set, a pre-computed table containing the mean codon usage of ribosomal protein genes and, as a novelty, the tRNA gene-copy numbers. OPTIMIZER is accessible free of charge at http://genomes.urv.es/OPTIMIZER
An Unsupervised Algorithm for Host Identification in Flaviviruses
Early characterization of emerging viruses is essential to control their spread, such as the Zika Virus outbreak in 2014. Among other non-viral factors, host information is essential for the surveillance and control of virus spread. Flaviviruses (genus Flavivirus), akin to other viruses, are modulated by high mutation rates and selective forces to adapt their codon usage to that of their hosts. However, a major challenge is the identification of potential hosts for novel viruses. Usually, potential hosts of emerging zoonotic viruses are identified after several confirmed cases. This is inefficient for deterring future outbreaks. In this paper, we introduce an algorithm to identify the host range of a virus from its raw genome sequences. The proposed strategy relies on comparing codon usage frequencies across viruses and hosts, by means of a normalized Codon Adaptation Index (CAI). We have tested our algorithm on 94 flaviviruses and 16 potential hosts. This novel method is able to distinguish between arthropod and vertebrate hosts for several flaviviruses with high values of accuracy (virus group 91.9% and host type 86.1%) and specificity (virus group 94.9% and host type 79.6%), in comparison to empirical observations. Overall, this algorithm may be useful as a complementary tool to current phylogenetic methods in monitoring current and future viral outbreaks by understanding host-virus relationships
RCDI/eRCDI: a web-server to estimate codon usage deoptimization
<p>Abstract</p> <p>Background</p> <p>The Relative Codon Deoptimization Index (RCDI) was developed by Mueller et al. (2006) as measure of codon deoptimization by comparing how similar is the codon usage of a gene and the codon usage of a reference genome.</p> <p>Findings</p> <p>RCDI/eRCDI is a web application server that calculates the Relative Codon Deoptimization Index and a new expected value for the RCDI (eRCDI). The RCDI is used to estimate the similarity of the codon frequencies of a specific gene in comparison to a given reference genome. The eRCDI is determined by generating random sequences with similar G+C and amino acid composition to the input sequences and may be used as an indicator of the significance of the RCDI values. RCDI/eRCDI is freely available at <url>http://genomes.urv.cat/CAIcal/RCDI</url>.</p> <p>Conclusions</p> <p>This web server will be a useful tool for genome analysis, to understand host-virus phylogenetic relationships or to infer the potential host range of a virus and its replication strategy, as well as in experimental virology to ease the step of gene design for heterologous protein expression.</p
Identification of Human IKK-2 Inhibitors of Natural Origin (Part I): Modeling of the IKK-2 Kinase Domain, Virtual Screening and Activity Assays
BACKGROUND: Their large scaffold diversity and properties, such as structural complexity and drug similarity, form the basis of claims that natural products are ideal starting points for drug design and development. Consequently, there has been great interest in determining whether such molecules show biological activity toward protein targets of pharmacological relevance. One target of particular interest is hIKK-2, a serine-threonine protein kinase belonging to the IKK complex that is the primary component responsible for activating NF-ÎșB in response to various inflammatory stimuli. Indeed, this has led to the development of synthetic ATP-competitive inhibitors for hIKK-2. Therefore, the main goals of this study were (a) to use virtual screening to identify potential hIKK-2 inhibitors of natural origin that compete with ATP and (b) to evaluate the reliability of our virtual-screening protocol by experimentally testing the in vitro activity of selected natural-product hits. METHODOLOGY/PRINCIPAL FINDINGS: We thus predicted that 1,061 out of the 89,425 natural products present in the studied database would inhibit hIKK-2 with good ADMET properties. Notably, when these 1,061 molecules were merged with the 98 synthetic hIKK-2 inhibitors used in this study and the resulting set was classified into ten clusters according to chemical similarity, there were three clusters that contained only natural products. Five molecules from these three clusters (for which no anti-inflammatory activity has been previously described) were then selected for in vitro activity testing, in which three out of the five molecules were shown to inhibit hIKK-2. CONCLUSIONS/SIGNIFICANCE: We demonstrated that our virtual-screening protocol was successful in identifying lead compounds for developing new inhibitors for hIKK-2, a target of great interest in medicinal chemistry. Additionally, all the tools developed during the current study (i.e., the homology model for the hIKK-2 kinase domain and the pharmacophore) will be made available to interested readers upon request
- âŠ