3,189 research outputs found
Specialized Hidden Markov Model Databases for Microbial Genomics
As hidden Markov models (HMMs) become increasingly more important in the
analysis of biological sequences, so too have databases of HMMs expanded in size,
number and importance. While the standard paradigm a short while ago was the
analysis of one or a few sequences at a time, it has now become standard procedure
to submit an entire microbial genome. In the future, it will be common to submit large
groups of completed genomes to run simultaneously against a dozen public databases
and any number of internally developed targets. This paper looks at some of the
readily available HMM (or HMM-like) algorithms and several publicly available
HMM databases, and outlines methods by which the reader may develop custom
HMM targets
Protein subfamily assignment using the Conserved Domain Database
© 2008 Fong et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens
AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis
Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys
CDD: specific functional annotation with the Conserved Domain Database
NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either ‘specific’ (identifying molecular function with high confidence) or as ‘non-specific’ (identifying superfamily membership only)
Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites
IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html
Complete genome sequence of the broad-host-range Paenibacillus larvae phage phiIBB_Pl23
Paenibacillus larvae is a Gram-positive bacterium that causes American foulbrood, an important disease in apiculture. We report
the first complete genome sequence of a P. larvae phage, phiIBB_Pl23, isolated from a hive in northern Portugal. This
phage belongs to the family Siphoviridae.A. O. and L.D.R.M. acknowledge the FCT (Fundacao para a ciencia e a tecnologia) grants SFRH/BPD/69356/2010 and SFRH/BD/66166/2009, respectively
ARDB—Antibiotic Resistance Genes Database
The treatment of infections is increasingly compromised by the ability of bacteria to develop resistance to antibiotics through mutations or through the acquisition of resistance genes. Antibiotic resistance genes also have the potential to be used for bio-terror purposes through genetically modified organisms. In order to facilitate the identification and characterization of these genes, we have created a manually curated database—the Antibiotic Resistance Genes Database (ARDB)—unifying most of the publicly available information on antibiotic resistance. Each gene and resistance type is annotated with rich information, including resistance profile, mechanism of action, ontology, COG and CDD annotations, as well as external links to sequence and protein databases. Our database also supports sequence similarity searches and implements an initial version of a tool for characterizing common mutations that confer antibiotic resistance. The information we provide can be used as compendium of antibiotic resistance factors as well as to identify the resistance genes of newly sequenced genes, genomes, or metagenomes. Currently, ARDB contains resistance information for 13 293 genes, 377 types, 257 antibiotics, 632 genomes, 933 species and 124 genera. ARDB is available at http://ardb.cbcb.umd.edu/
SA-Mot: a web server for the identification of motifs of interest extracted from protein loops
The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr
CDD: a Conserved Domain Database for the functional annotation of proteins
NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
- …