Search CORE

3,185 research outputs found

Specialized Hidden Markov Model Databases for Microbial Genomics

Author: Altschul
Bateman
Eddy
Eddy
Gollery
Gough
Grundy
Krogh
Letunic
Marchler-Bauer
Marchler-Bauer
Martin Gollery
Tatusov
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2003
Field of study

As hidden Markov models (HMMs) become increasingly more important in the analysis of biological sequences, so too have databases of HMMs expanded in size, number and importance. While the standard paradigm a short while ago was the analysis of one or a few sequences at a time, it has now become standard procedure to submit an entire microbial genome. In the future, it will be common to submit large groups of completed genomes to run simultaneously against a dozen public databases and any number of internally developed targets. This paper looks at some of the readily available HMM (or HMM-like) algorithms and several publicly available HMM databases, and outlines methods by which the reader may develop custom HMM targets

Crossref

Directory of Open Access Journals

PubMed Central

Protein subfamily assignment using the Conserved Domain Database

Author: Fong Jessica H
Marchler-Bauer Aron
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis

Author: Aniba Mohamed Radhouene
Marchler-Bauer Aron
Poch Olivier
Thompson Julie Dawn
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys

CiteSeerX

HAL-Inserm

PubMed Central

CDD: specific functional annotation with the Conserved Domain Database

Author: A. Marchler-Bauer
A. Tasneem
Altschul
C. A. Liebert
C. DeWeese-Scott
C. J. Lanczycki
C. Liu
D. I. Hurwitz
D. Zhang
F. Chitsaz
F. Lu
G. H. Marchler
Geer
J. B. Anderson
J. D. Jackson
J. H. Fong
J. S. Song
L. Y. Geer
Letunic
M. Gwadz
M. K. Derbyshire
M. Mullokandov
Marchler-Bauer
Marchler-Bauer
Marchler-Bauer
N. R. Gonzales
N. Thanki
N. Zhang
R. A. Yamashita
R. C. Geer
S. H. Bryant
S. He
S. Lu
Tatusov
Z. Ke
Publication venue: Oxford University Press
Publication date
Field of study

NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either ‘specific’ (identifying molecular function with high confidence) or as ‘non-specific’ (identifying superfamily membership only)

Crossref

PubMed Central

Complete genome sequence of the broad-host-range Paenibacillus larvae phage phiIBB_Pl23

Author: Altschul
Aziz
Christie
Genersch
Käll
Käll
Laslett
Marchler-Bauer
Naville
Oliveira
Oliveira
Park
Pouillot
Sabour
Schattner
Sierro
Smith
Thorpe
Zuker
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2013
Field of study

Paenibacillus larvae is a Gram-positive bacterium that causes American foulbrood, an important disease in apiculture. We report the first complete genome sequence of a P. larvae phage, phiIBB_Pl23, isolated from a hive in northern Portugal. This phage belongs to the family Siphoviridae.A. O. and L.D.R.M. acknowledge the FCT (Fundacao para a ciencia e a tecnologia) grants SFRH/BPD/69356/2010 and SFRH/BD/66166/2009, respectively

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

PubMed Central

Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites

Author: Anna R. Panchenko
Aron Marchler-Bauer
Atwell
Benjamin A. Shoemaker
Bork
Brylinski
Campbell
Chen
Chen
Dachuan Zhang
Gerlt
Gibrat
Giot
Hegyi
Hernandez
Huang
Jessica H. Fong
Jones
Krissinel
Landgraf
Laurie
Li
Manoj Tyagi
Marchler-Bauer
Marchler-Bauer
Matthews
Pazos
Qin
Ratna R. Thangudu
Rentzsch
Shoemaker
Slonim
Snyder
Stein
Stephen H. Bryant
Sussman
Talavera
Teichmann
Thomas Madej
Wang
Wang
Wang
Yu
Publication venue: Oxford University Press
Publication date
Field of study

IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html

Crossref

PubMed Central

ARDB—Antibiotic Resistance Genes Database

Author: Alekshun
Altschul
B. Liu
Bancroft
Barber
Bilgin
Delaney
Gandhi
M. Pop
Marchler-Bauer
Ross
Ruiz
Scaria
Selvey
Publication venue: Oxford University Press
Publication date
Field of study

The treatment of infections is increasingly compromised by the ability of bacteria to develop resistance to antibiotics through mutations or through the acquisition of resistance genes. Antibiotic resistance genes also have the potential to be used for bio-terror purposes through genetically modified organisms. In order to facilitate the identification and characterization of these genes, we have created a manually curated database—the Antibiotic Resistance Genes Database (ARDB)—unifying most of the publicly available information on antibiotic resistance. Each gene and resistance type is annotated with rich information, including resistance profile, mechanism of action, ontology, COG and CDD annotations, as well as external links to sequence and protein databases. Our database also supports sequence similarity searches and implements an initial version of a tool for characterizing common mutations that confer antibiotic resistance. The information we provide can be used as compendium of antibiotic resistance factors as well as to identify the resistance genes of newly sequenced genes, genomes, or metagenomes. Currently, ARDB contains resistance information for 13 293 genes, 377 types, 257 antibiotics, 632 genomes, 933 species and 124 genera. ARDB is available at http://ardb.cbcb.umd.edu/

Crossref

PubMed Central

SA-Mot: a web server for the identification of motifs of interest extracted from protein loops

Author: Adrien Saladin
Anne-Claude Camproux
Ausiello
Baussand
Berman
Bordner
Camproux
Colette Geneix
Crooks
Guyon
Halperin
Holm
Hulo
Julien Maupetit
Karlin
Leslie Regad
Marchler-Bauer
Marchler-Bauer
Martin
Martin
Maupetit
Maupetit
Murzin
Nuel
Nuel
Polacco
Pugalenthi
Regad
Regad
Regad
Regad
Shoemaker
Via
Publication venue: Oxford University Press
Publication date
Field of study

The detection of functional motifs is an important step for the determination of protein functions. We present here a new web server SA-Mot (Structural Alphabet Motif) for the extraction and location of structural motifs of interest from protein loops. Contrary to other methods, SA-Mot does not focus only on functional motifs, but it extracts recurrent and conserved structural motifs involved in structural redundancy of loops. SA-Mot uses the structural word notion to extract all structural motifs from uni-dimensional sequences corresponding to loop structures. Then, SA-Mot provides a description of these structural motifs using statistics computed in the loop data set and in SCOP superfamily, sequence and structural parameters. SA-Mot results correspond to an interactive table listing all structural motifs extracted from a target structure and their associated descriptors. Using this information, the users can easily locate loop regions that are important for the protein folding and function. The SA-Mot web server is available at http://sa-mot.mti.univ-paris-diderot.fr

Crossref

PubMed Central

CDD: a Conserved Domain Database for the functional annotation of proteins

Author: A. Marchler-Bauer
C. DeWeese-Scott
C. J. Lanczycki
C. L. Robertson
C. Zheng
D. I. Hurwitz
D. Zhang
F. Chitsaz
F. Lu
Fong
G. H. Marchler
J. B. Anderson
J. D. Jackson
J. H. Fong
J. S. Song
L. Y. Geer
Letunic
M. Gwadz
M. K. Derbyshire
M. Mullokandov
M. V. Omelchenko
Marchler-Bauer
Marchler-Bauer
N. R. Gonzales
N. Thanki
N. Zhang
R. A. Yamashita
R. C. Geer
S. H. Bryant
S. Lu
Tatusov
Z. Ke
Publication venue: Oxford University Press
Publication date
Field of study

NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml

Crossref

PubMed Central