15 research outputs found
Dealing with the Data Deluge – New Strategies in Prokaryotic Genome Analysis
Recent technological innovations have ignited an explosion in microbial genome sequencing that has fundamentally changed our understanding of biology of microbes and profoundly impacted public health policy. This huge increase in DNA sequence data presents new challenges for the annotation, analysis, and visualization bioinformatics tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. Genomes are organized in a hierarchical distance tree using single-copy ribosomal protein marker distances for distance calculation. Protein distance measures dissimilarity between markers of the same type and the subsequent genomic distance averages over the majority of marker-distances, ignoring the outliers. More than 30,000 genomes from public archives have been organized in a marker distance tree resulting in 6,438 species-level clades representing 7,597 taxonomic species. This computational infrastructure provides a foundation for prokaryotic gene and genome analysis, allowing easy access to pre-calculated genome groups at various distance levels. One of the most challenging problems in the current data deluge is the presentation of the relevant data at an appropriate resolution for each application, eliminating data redundancy but keeping biologically interesting variations
The National Center for Biotechnology Information's Protein Clusters Database
Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters
Recommended from our members
A physical and genetic microsatellite map of the chicken Z chromosome
Genetic and physical mapping of human and animal genomes has been greatly facilitated by the use of chromosome specific DNA libraries. Mapping with libraries specific to a chromosome or chromosomal region increases marker saturation by reducing the gaps resulting from a purely random shotgun approach. This study was undertaken to construct a genetic and physical map of microsatellites on the chicken Z chromosome. This chromosome is the fifth largest in the chicken genome, comprising about 8% of the total, yet very few microsatellites have been mapped to it. DNA originating from the chicken Z chromosome was previously isolated and reported. This was used to construct a small insert library in Lambda ZAP Express, representing 14 chromosome equivalents. This library was screened for microsatellites with an (AC)12 oligo, and positive clones were isolated. Confirmation of the presence of the microsatellite, as well as its approximate location in the insert was accomplished by PCR amplification. Clones with adequate flanking regions were sequenced, and primers for 19 microsatellites were developed. These primers were used to genotype individuals from the East Lansing poultry reference population and a linkage map was constructed. Thirteen markers were scorable and polymorphic in the population. These were combined with 64 existing markers, and the resulting map spans 220 cM with an average spacing of 2.7 cM between markers. The physical location of selected markers were established by fluorescent in situ hybridization (FISH.) Hybridization results enabled the anchoring and orientation of the linkage group along the length of the Z chromosome
Additional file 1 of Clustering analysis of proteins from microbial genomes at multiple levels of resolution
Table S1. Shows per-clade statistics for 131 abundant clades; number of proteins represents non-redundant set of non-identical protein sequences. (PDF 38 kb
Recommended from our members
A Hydrogen-Based Subsurface Microbial Community Dominated by Methanogens
Recommended from our members
Geobacter Project
Analysis of the Genetic Potential and Gene Expression of Microbial Communities Involved in the In Situ Bioremediation of Uranium and Harvesting Electrical Energy from Organic Matter The primary goal of this research is to develop conceptual and computational models that can describe the functioning of complex microbial communities involved in microbial processes of interest to the Department of Energy. Microbial Communities to be Investigated: (1) Microbial community associated with the in situ bioremediation of uranium-contaminated groundwater; and (2) Microbial community that is capable of harvesting energy from waste organic matter in the form of electricity