Search CORE

4,996 research outputs found

Genome display tool: visualizing features in complex data sets

Author: Fox George E
Lu Yue
Viswanath Lalitha
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: The enormity of the information contained in large data sets makes it difficult to develop intuitive understanding. It would be useful to have software that allows visualization of possible correlations between properties that can be associated with a core data set. In the case of bacterial genomes, existing visualization tools focus on either global properties such as variations in composition or detailed local displays of the features that comprise the annotation. It is not easy to visualize other information in the context of this core information. RESULTS: A Java based software known as the Genome Display Tool (GDT), allows the user to simultaneously view the distribution of multiple attributes pertaining to genes and intragenic regions in a single bacterial genome using different colours and shapes on a single screen. The display represents each gene by small boxes that correlate with physical position in the genome. The size of the boxes is dynamically allocated based on the number of genes and a zoom feature allows close-up inspection of regions of interest. The display is interfaced with a MS-Access relational database and can display any feature in the database that can be represented by discrete values. Data is readily added to the database from an MS-Excel spread sheet. The functionality of GDT is demonstrated by comparing the results of two predictions of recent horizontal transfer events in the genome of Synechocystis PCC-6803. The resulting display allows the user to immediately see how much agreement exists between the two methods and also visualize how genes in various categories (e.g. predicted in both methods, one method etc) are distributed in the genome. CONCLUSION: The GDT software provides the user with a powerful tool that allows development of an intuitive understanding of the relative distribution of features in a large data set. As additional features are added to the data set, the number of possible correlations that can be visualized grows rapidly. Although described here for use in bacterial genomics, the principle is general and similar software might be useful in other contexts such as patient studies

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Microbial identification by mass cataloging

Author: Fox George E
Jackson George W
Willson Richard C
Zhang Zhengdong
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The public availability of over 180,000 bacterial 16S ribosomal RNA (rRNA) sequences has facilitated microbial identification and classification using hybridization and other molecular approaches. In their usual format, such assays are based on the presence of unique subsequences in the target RNA and require a prior knowledge of what organisms are likely to be in a sample. They are thus limited in generality when analyzing an unknown sample. Herein, we demonstrate the utility of catalogs of masses to characterize the bacterial 16S rRNA(s) in any sample. Sample nucleic acids are digested with a nuclease of known specificity and the products characterized using mass spectrometry. The resulting catalogs of masses can subsequently be compared to the masses known to occur in previously-sequenced 16S rRNAs allowing organism identification. Alternatively, if the organism is not in the existing database, it will still be possible to determine its genetic affinity relative to the known organisms. RESULTS: Ribonuclease T(1 )and ribonuclease A digestion patterns were calculated for 1,921 complete 16S rRNAs. Oligoribonucleotides generated by RNase T(1 )of length 9 and longer produce sufficient diversity of masses to be informative. In addition, individual fragments or combinations thereof can be used to recognize the presence of specific organisms in a complex sample. In this regard, 140 strains out of 1,921 organisms (7.3%) could be identified by the presence of a unique RNase T(1)-generated oligoribonucleotide mass. Combinations of just two and three oligoribonucleotide masses allowed 54% and 72% of the specific strains to be identified, respectively. An initial algorithm for recovering likely organisms present in complex samples is also described. CONCLUSION: The use of catalogs of compositions (masses) of characteristic oligoribonucleotides for microbial identification appears extremely promising. RNase T(1 )is more useful than ribonuclease A in generating characteristic masses, though RNase A produces oligomers which are more readily distinguished due to the large mass difference between A and G. Identification of multiple species in mixtures is also feasible. Practical applicability of the method depends on high performance mass spectrometric determination, and/or use of methods that increase the one dalton (Da) mass difference between uracil and cytosine

Directory of Open Access Journals

PubMed Central

University of Houston Institutional Repository (UHIR)

RECOVIR Software for Identifying Viruses

Author: Chakravarty Sugoto
Fox George E.
Zhu Dianhui
Publication venue
Publication date
Field of study

Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens

NASA Technical Reports Server

Bacterial genotyping by 16S rRNA mass cataloging

Author: Fox George E
Jackson George W
McNichols Roger J
Willson Richard C
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: It has recently been demonstrated that organism identifications can be recovered from mass spectra using various methods including base-specific fragmentation of nucleic acids. Because mass spectrometry is extremely rapid and widely available such techniques offer significant advantages in some applications. A key element in favor of mass spectrometric analysis of RNA fragmentation patterns is that a reference database for analysis of the results can be generated from sequence information. In contrast to hybridization approaches, the genetic affinity of any unknown isolate can in principle be determined within the context of all previously sequenced 16S rRNAs without prior knowledge of what the organism is. In contrast to the original RNase T(1 )cataloging method, when digestion products are analyzed by mass spectrometry, products with the same base composition cannot be distinguished. Hence, it is possible that organisms that are not closely related (having different underlying sequences) might be falsely identified by mass spectral coincidence. We present a convenient spectral coincidence function for expressing the degree of similarity (or distance) between any two mass-spectra. Trees constructed using this function are consistent with those produced by direct comparison of primary sequences, demonstrating that the inherent degeneracy in mass spectrometric analysis of RNA fragments does not preclude correct organism identification. RESULTS: Neighbor-joining trees for important bacterial pathogens were generated using distances based on mass spectrometric observables and the spectral coincidence function. These trees demonstrate that most pathogens will be readily distinguished using mass spectrometric analyses of RNA digestion products. A more detailed, genus-level analysis of pathogens and near relatives was also performed, and it was found that assignments of genetic affinity were consistent with those obtained by direct sequence comparisons. Finally, typical values of the coincidence between organisms were also examined with regard to phylogenetic level and sequence variability. CONCLUSION: Cluster analysis based on comparison of mass spectrometric observables using the spectral coincidence function is an extremely useful tool for determining the genetic affinity of an unknown bacterium. Additionally, fragmentation patterns can determine within hours if an unknown isolate is potentially a known pathogen among thousands of possible organisms, and if so, which one

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Houston Institutional Repository (UHIR)

Methods for determining the genetic affinity of microorganisms and viruses

Author: Fox George E.
Willson III, Richard C.
Zhang Zhengdong
Publication venue
Publication date: 03/07/2012
Field of study

Selecting which sub-sequences in a database of nucleic acid such as 16S rRNA are highly characteristic of particular groupings of bacteria, microorganisms, fungi, etc. on a substantially phylogenetic tree. Also applicable to viruses comprising viral genomic RNA or DNA. A catalogue of highly characteristic sequences identified by this method is assembled to establish the genetic identity of an unknown organism. The characteristic sequences are used to design nucleic acid hybridization probes that include the characteristic sequence or its complement, or are derived from one or more characteristic sequences. A plurality of these characteristic sequences is used in hybridization to determine the phylogenetic tree position of the organism(s) in a sample. Those target organisms represented in the original sequence database and sufficient characteristic sequences can identify to the species or subspecies level. Oligonucleotide arrays of many probes are especially preferred. A hybridization signal can comprise fluorescence, chemiluminescence, or isotopic labeling, etc.; or sequences in a sample can be detected by direct means, e.g. mass spectrometry. The method's characteristic sequences can also be used to design specific PCR primers. The method uniquely identifies the phylogenetic affinity of an unknown organism without requiring prior knowledge of what is present in the sample. Even if the organism has not been previously encountered, the method still provides useful information about which phylogenetic tree bifurcation nodes encompass the organism

NASA Technical Reports Server

Visualization of ribosomal RNA operon copy number distribution

Author: DasGupta Indrani
Fox George E
Rastogi Rajat
Wu Martin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Results of microbial ecology studies using 16S rRNA sequence information can be deceiving due to differences in rRNA operon copy number and genome size of the detected organisms. It therefore will be useful for investigators to have a better understanding of how these two parameters differ in various organism types. In this study, the number of ribosomal operons and genome size were separately mapped onto a Bacterial phylogenetic tree. Results A representative Bacterial tree was constructed using 31 marker genes found in 578 bacterial genome sequences. Organism names are displayed on the trees using graduations of color such that similar colors indicate similar numbers of operons or genome size. The resulting images provide an intuitive understanding of how copy number and genome size vary in different Bacterial phyla. Conclusion Once the phylogenetic position of a novel organism is known the number of rRNA operons, and to a lesser extent the genome size, can be estimated by examination of the colored maps. Further detail can then be obtained for members of relevant taxa from the rrnDB database.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Stress-Driven Selection of Novel Phenotypes

Author: Fox George E.
Liu Yamei
Stepaov Victor G.
Publication venue
Publication date
Field of study

A process has been developed that can confer novel properties, such as metal resistance, to a host bacterium. This same process can also be used to produce RNAs and peptides that have novel properties, such as the ability to bind particular compounds. It is inherent in the method that the peptide or RNA will behave as expected in the target organism. Plasmid-born mini-gene libraries coding for either a population of combinatorial peptides or stable, artificial RNAs carrying random inserts are produced. These libraries, which have no bias towards any biological function, are used to transform the organism of interest and to serve as an initial source of genetic variation for stress-driven evolution. The transformed bacteria are propagated under selective pressure in order to obtain variants with the desired properties. The process is highly distinct from in vitro methods because the variants are selected in the context of the cell while it is experiencing stress. Hence, the selected peptide or RNA will, by definition, work as expected in the target cell as the cell adapts to its presence during the selection process. Once the novel gene, which produces the sought phenotype, is obtained, it can be transferred to the main genome to increase the genetic stability in the organism. Alternatively, the cell line can be used to produce novel RNAs or peptides with selectable properties in large quantity for separate purposes. The system allows for easy, large-scale purification of the RNAs or peptide products. The process has been reduced to practice by imposing sub-inhibitory concentrations of NiCl2 on cells of the bacterium Escherichia coli that were transformed separately with the peptide library and RNA library. The evolved resistant clones were isolated, and sequences of the selected mini-gene variants were established. Clones resistant to NiCl2 were found to carry identical plasmid variants with a functional mini-gene that specifically conferred significant nickel tolerance on the host cells. Sequencing of the selected mini-gene revealed a propensity of the encoded peptide to bind transient metal ions. Expression of the mini-gene markedly improved growth parameters of the evolved clones at sub-inhibitory concentrations of NiCl2 while being slightly detrimental in the absence of stress. Similar results have been obtained with the RNA libraries. Overall, the results demonstrate a very natural outcome of the selection experiments in which the mini-genes were expected to be either successfully integrated into bacterial genetic networks, or rejected depending upon their effect on host fitness. This described approach can be useful as a laboratory model to study the dynamics of bacterial adaptive evolution on the molecular level. It can also provide a strategy for screening expressed DNA libraries in search of novel genes with desirable properties

NASA Technical Reports Server

XII.—Excavations on the site of the Roman city at Silchester, Hants, in 1900.

Author: Fox George E.
Hope W. H. St. John
Reid Clement
Publication venue
Publication date: 01/01/1901
Field of study

ZENODO