184 research outputs found
Using purine skews to predict genes in AT-rich poxviruses
BACKGROUND: Clusters or runs of purines on the mRNA synonymous strand have been found in many different organisms including orthopoxviruses. The purine bias that is exhibited by these clusters can be observed using a purine skew and in the case of poxviruses, these skews can be used to help determine the coding strand of a particular segment of the genome. Combined with previous findings that minor ORFs have lower than average aspartate and glutamate composition and higher than average serine composition, purine content can be used to predict the likelihood of a poxvirus ORF being a "real gene". RESULTS: Using purine skews and a "quality" measure designed to incorporate previous findings about minor ORFs, we have found that in our training case (vaccinia virus strain Copenhagen), 59 of 65 minor (small and unlikely to be a real genes) ORFs were correctly classified as being minor. Of the 201 major (large and likely to be real genes) vaccinia ORFs, 192 were correctly classified as being major. Performing a similar analysis with the entomopoxvirus amsacta moorei (AMEV), it was found that 4 major ORFs were incorrectly classified as minor and 9 minor ORFs were incorrectly classified as major. The purine abundance observed for major ORFs in vaccinia virus was found to stem primarily from the first codon position with both the second and third codon positions containing roughly equal amounts of purines and pyrimidines. CONCLUSION: Purine skews and a "quality" measure can be used to predict functional ORFs and purine skews in particular can be used to determine which of two overlapping ORFs is most likely to be the real gene if neither of the two ORFs has orthologs in other poxviruses
Host-derived pathogenicity islands in poxviruses
BACKGROUND: Poxviruses are important both as pathogens and as vaccine vectors. Poxvirus genomes (150–350 kb) consist of a single linear dsDNA molecule; the two polynucleotide strands are joined by short hairpin loops. The genomes encode highly conserved proteins required for DNA replication and mRNA transcription as well as a variable set of virulence factors; transcription takes place within the cytoplasm of the host cell. We are interested in evolution of poxvirus genomes and especially how these viruses acquire host-derived genes that are believed to function as virulence factors. RESULTS: Using a variety of bioinformatics tools, we have identified regions in poxvirus genomes that have unusual nucleotide composition (higher or lower than average A+T content) compared to the genome as a whole; such regions may be several kilobases in length and contain a number of genes. Regions with unusual nucleotide composition may represent genes that have been recently acquired from the host genome. The study of these genomic regions with unusual nucleotide content will help elucidate evolutionary processes in poxviruses. CONCLUSION: We have found that dotplots of complete poxvirus genomes can be used to locate regions on the genome that differ significantly in A+T content to the genome as a whole. The genes in these regions may have been acquired relatively recently from the host genome or from another AT-rich poxvirus
Coupling a radial model of the Darcy-Forchheimer equation with a regional groundwater model to simulate drawdown at supply boreholes
Assessing the short and long-term risks to a groundwater source is a critical part of water resource
management. In the UK, public water supply companies apply the term Deployable Output (DO) to
describe the yield of a groundwater source under drought conditions. DO is constrained by the physical
properties of
an aquifer and operational factors such as licence conditions, water quality,
and pumping
and treatment capacity. A robust assessment of groundwater DO should be informed by numerical
modeling. This requires the groundwater level in a supply borehole to be accurately simulated within its
regional hydrogeological context. A 3D radial flow model of the Darcy-Forchheimer equation is presented
for simulating drawdown at a borehole. The Darcy-Forchheimer Radial Flow Model (DFRFM) represents
linear and non-linear flows around the borehole; confined and unconfined conditions; vertical
heterogeneity in the aquifer and borehole storage. The DFRFM is coupled with a regional groundwater
model which represents the large-scale groundwater system, including lateral and vertical aquifer
heterogeneity, rivers, and spatially varying recharge. The model has been applied to a supply borehole
located
in the dual permeability Chalk aquifer, which forms the principal aquifer in the UK and provides
40-70% of the total public water supply in southern and eastern England. The application demonstrates
the potential for the coupled model to be used to inform DO assessments and to assess the long-term
risk to sources under climate change scenarios
Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome
BACKGROUND: Since DNA sequencing has become easier and cheaper, an increasing number of closely related viral genomes have been sequenced. However, many of these have been deposited in GenBank without annotations, severely limiting their value to researchers. While maintaining comprehensive genomic databases for a set of virus families at the Viral Bioinformatics Resource Center and Viral Bioinformatics – Canada , we found that researchers were unnecessarily spending time annotating viral genomes that were close relatives of already annotated viruses. We have therefore designed and implemented a novel tool, Genome Annotation Transfer Utility (GATU), to transfer annotations from a previously annotated reference genome to a new target genome, thereby greatly reducing this laborious task. RESULTS: GATU transfers annotations from a reference genome to a closely related target genome, while still giving the user final control over which annotations should be included. GATU also detects open reading frames present in the target but not the reference genome and provides the user with a variety of bioinformatics tools to quickly determine if these ORFs should also be included in the annotation. After this process is complete, GATU saves the newly annotated genome as a GenBank, EMBL or XML-format file. The software is coded in Java and runs on a variety of computer platforms. Its user-friendly Graphical User Interface is specifically designed for users trained in the biological sciences. CONCLUSION: GATU greatly simplifies the initial stages of genome annotation by using a closely related genome as a reference. It is not intended to be a gene prediction tool or a "complete" annotation system, but we have found that it significantly reduces the time required for annotation of genes and mature peptides as well as helping to standardize gene names between related organisms by transferring reference genome annotations to the target genome. The program is freely available under the General Public License and can be accessed along with documentation and tutorial from
Evidence for a novel gene associated with human influenza A viruses
© 2009 Clifford et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens
Recent Hits Acquired by BLAST (ReHAB): A tool to identify new hits in sequence similarity searches
BACKGROUND: Sequence similarity searching is a powerful tool to help develop hypotheses in the quest to assign functional, structural and evolutionary information to DNA and protein sequences. As sequence databases continue to grow exponentially, it becomes increasingly important to repeat searches at frequent intervals, and similarity searches retrieve larger and larger sets of results. New and potentially significant results may be buried in a long list of previously obtained sequence hits from past searches. RESULTS: ReHAB (Recent Hits Acquired from BLAST) is a tool for finding new protein hits in repeated PSI-BLAST searches. ReHAB compares results from PSI-BLAST searches performed with two versions of a protein sequence database and highlights hits that are present only in the updated database. Results are presented in an easily comprehended table, or in a BLAST-like report, using colors to highlight the new hits. ReHAB is designed to handle large numbers of query sequences, such as whole genomes or sets of genomes. Advanced computer skills are not needed to use ReHAB; the graphics interface is simple to use and was designed with the bench biologist in mind. CONCLUSIONS: This software greatly simplifies the problem of evaluating the output of large numbers of protein database searches
A new method for indexing genomes using on-disk suffix trees
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the in-put string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before
- …