Search CORE

eScholarship - University of California

Considering scores between unrelated proteins in the search database improves profile comparison

Author: AA Schaffer
DT Jones
G Yona
J Soding
L Rychlewski
M Frenkel-Morgenstern
M Madera
Nick V Grishin
R Sadreyev
RI Sadreyev
Ruslan I Sadreyev
S Karlin
S Pietrokovski
S Shi
SF Altschul
SF Altschul
Y Qi
Y Wang
Y Zhang
YK Yu
Yong Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similarities is essential for further development of this methodology. Here we analyze a novel approach to estimate the statistical significance of profile similarity: the explicit consideration of background score distributions for each database template (subject). Results Using a simple scheme to combine and analytically approximate query- and subject-based distributions, we show that (i) inclusion of background distributions for the subjects increases the quality of homology detection; (ii) this increase is higher when the distributions are based on the scores to all known non-homologs of the subject rather than a small calibration subset of the database representatives; and (iii) these all known non-homolog distributions of scores for the subject make the dominant contribution to the improved performance: adding the calibration distribution of the query has a negligible additional effect. Conclusion The construction of distributions based on the complete sets of non-homologs for each subject is particularly relevant in the setting of structure prediction where the database consists of proteins with solved 3D structure (PDB, SCOP, CATH, etc.) and therefore structural relationships between proteins are known. These results point to a potential new direction in the development of more powerful methods for remote homology detection.</p

Springer - Publisher Connector

Minimal Absent Words in Prokaryotic and Eukaryotic Genomes

Author: A Gentles
AJ Pinho
Armando J. Pinho
C Acquisti
C Simillion
Carlos A. C. Bastos
Christian Schönbach
D Gusfield
E Margulies
G Hampikian
I Ulitsky
J Herold
João M. O. S. Rodrigues
M Burrows
MI Abouelhoda
Paulo J. S. G. Ferreira
R Sokal
S Karlin
S Karlin
S Karlin
S Karlin
S Karlin
S Pietrokovski
Sara P. Garcia
T Kasai
V Brendel
Publication venue: Public Library of Science
Publication date: 31/01/2011
Field of study

Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates

In Vivo Characterization of the Homing Endonuclease within the polB Gene in the Halophilic Archaeon Haloferax volcanii

Author: A Burt
A Large
Adi Barzel
Adit Naor
Arthur J. Lustig
BL Stoddard
C Aagaard
C Norais
F Paques
FB Perler
G Bitan-Banin
JP Gogarten
M Scalley-Kim
N Agmon
R. Thane Papke
Rona Lazary
S Delmas
S Kurokawa
S Pietrokovski
SW Cline
T Allers
T Allers
T Miyake
Uri Gophna
Publication venue: Public Library of Science
Publication date: 20/01/2011
Field of study

Inteins are parasitic genetic elements, analogous to introns that excise themselves at the protein level by self-splicing, allowing the formation of functional non-disrupted proteins. Many inteins contain a homing endonuclease (HEN) gene, and rely on its activity for horizontal propagation. In the halophilic archaeon, Haloferax volcanii, the gene encoding DNA polymerase B (polB) contains an intein with an annotated but uncharacterized HEN. Here we examine the activity of the polB HEN in vivo, within its natural archaeal host. We show that this HEN is highly active, and able to insert the intein into both a chromosomal target and an extra-chromosomal plasmid target, by gene conversion. We also demonstrate that the frequency of its incorporation depends on the length of the flanking homologous sequences around the target site, reflecting its dependence on the homologous recombination machinery. Although several evolutionary models predict that the presence of an intein involves a change in the fitness of the host organism, our results show that a strain deleted for the intein sequence shows no significant changes in growth rate compared to the wild type

FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral

Author: A Sandelin
A Sandelin
Armando Blanco
BJ Wilson
BP Gomez
Carlos Cano
DE Schones
Fernando Garcia
Francisco J Lopez
G Pavesi
HD Das MK
HJ Zimmerman
IG Choi
J Keller
J Torchia
JA Hanley
JD Hughes
KA Becker
L Kaufman
L Zadeh
M Dutertre
M Sugeno
M Tompa
P D'haeseleer
R Osada
S Gupta
S Mahony
S Pietrokovski
S Roepcke
SJ Van Laere
T Sørlie
T Wang
TL Bailey
UJ Pape
V Matys
XS Liu
Y Huang
Y Pan
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background Regulatory motifs describe sets of related transcription factor binding sites (TFBSs) and can be represented as position frequency matrices (PFMs). De novo identification of TFBSs is a crucial problem in computational biology which includes the issue of comparing putative motifs with one another and with motifs that are already known. The relative importance of each nucleotide within a given position in the PFMs should be considered in order to compute PFM similarities. Furthermore, biological data are inherently noisy and imprecise. Fuzzy set theory is particularly suitable for modeling imprecise data, whereas fuzzy integrals are highly appropriate for representing the interaction among different information sources.Results We propose FISim, a new similarity measure between PFMs, based on the fuzzy integral of the distance of the nucleotides with respect to the information content of the positions. Unlike existing methods, FISim is designed to consider the higher contribution of better conserved positions to the binding affinity. FISim provides excellent results when dealing with sets of randomly generated motifs, and outperforms the remaining methods when handling real datasets of related motifs. Furthermore, we propose a new cluster methodology based on kernel theory together with FISim to obtain groups of related motifs potentially bound by the same TFs, providing more robust results than existing approaches.Conclusion FISim corrects a design flaw of the most popular methods, whose measures favour similarity of low information content positions. We use our measure to successfully identify motifs that describe binding sites for the same TF and to solve real-life problems. In this study the reliability of fuzzy technology for motif comparison tasks is proven.This work has been carried out as part of projects P08-TIC-4299 of J. A., Sevilla and TIN2006-13177 of DGICT, Madrid

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Repositorio Institucional Universidad de Granada

Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABA(A) receptor subunit genes

Author: Aerts
Anand
Ballas
Ballas
Blackwood
Boris E. Shakhnovich
Bosman
Brooks-Kayal
Bussemaker
Charles DeLisi
Daniel S. Roberts
Dawson
Dolan
Friberg
Frith
Gray
Harbison
Iyer
Kaplan
Kerr
Kirkness
Kuo
Lawrence
Lee
Lewin
Li
Liu
Macisaac
MacIsaac
Madhani
Morozov
Niehrs
Pellegrini
Perier
Pietrokovski
Purves
Reddy
Roberts
Roberts
Roth
Saffer
Shelley J. Russek
Siegel
Steiger
Stormo
Stormo
Swendeman
Temple
Therrien
Thiagalingam
Thijs
Timothy E. Reddy
Tompa
Treiman
Wall
Wasserman
Winderickx
Wingender
Wu
Publication venue: Oxford University Press
Publication date: 03/01/2007
Field of study

Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming. We introduce a simple strategy that dramatically improves robustness and accuracy of computational binding site prediction. First, we evaluate the rate of recurrence of computational TFBS predictions by commonly used sampling procedures. We find that the vast majority of results are biologically meaningless. However clustering results based on nucleotide position improves predictive power. Additionally, we find that positional clustering increases robustness to long or imperfectly selected input sequences. Positional clustering can also be used as a mechanism to integrate results from multiple sampling approaches for improvements in accuracy over each one alone. Finally, we predict and validate regulatory sequences partially responsible for transcriptional control of the mammalian type A γ-aminobutyric acid receptor (GABA(A)R) subunit genes. Positional clustering is useful for improving computational binding site predictions, with potential application to improving our understanding of mammalian gene expression. In particular, predicted regulatory mechanisms in the mammalian GABA(A)R subunit gene family may open new avenues of research towards understanding this pharmacologically important neurotransmitter receptor system

Boston University Institutional Repository (OpenBU)

Homing endonuclease I-TevIII: dimerization as a means to a double-strand break

Author: Athanasiadis
Bakhrat
Belfort
Belfort
Bell-Pedersen
Bell-Pedersen
Bryk
Chevalier
Darnell
Dean
Derbyshire
Dorie Smith
Doudeva
Drouin
Eddy
Edgell
Edgell
Edgell
Ferguson
Flick
Goodrich-Blair
Gorbalenya
Grishin
John T. Dansereau
Justin B. Robbins
Keeble
Kleanthous
Krishna
Ku
Landthaler
Loizos
Marlene Belfort
Matsuura
Matthew J. Stanger
Michelle Stapleton
Mosig
Mueller
Newman
Oakley
Ochman
Pedersen-Lane
Pietrokovski
Pommer
Quirk
Shen
Shub
Stauffer
Stewart
Stoddard
Studier
Van Roey
Van Roey
Victoria Derbyshire
Wu
Publication venue: Oxford University Press
Publication date: 08/02/2007
Field of study

Homing endonucleases are unusual enzymes, capable of recognizing lengthy DNA sequences and cleaving site-specifically within genomes. Many homing endonucleases are encoded within group I introns, and such enzymes promote the mobility reactions of these introns. Phage T4 has three group I introns, within the td, nrdB and nrdD genes. The td and nrdD introns are mobile, whereas the nrdB intron is not. Phage RB3 is a close relative of T4 and has a lengthier nrdB intron. Here, we describe I-TevIII, the H–N–H endonuclease encoded by the RB3 nrdB intron. In contrast to previous reports, we demonstrate that this intron is mobile, and that this mobility is dependent on I-TevIII, which generates 2-nt 3′ extensions. The enzyme has a distinct catalytic domain, which contains the H–N–H motif, and DNA-binding domain, which contains two zinc fingers required for interaction with the DNA substrate. Most importantly, I-TevIII, unlike the H–N–H endonucleases described so far, makes a double-strand break on the DNA homing site by acting as a dimer. Through deletion analysis, the dimerization interface was mapped to the DNA-binding domain. The unusual propensity of I-TevIII to dimerize to achieve cleavage of both DNA strands underscores the versatility of the H–N–H enzyme family

Efficacy of bone substitute material in preserving volume when placing a maxillary immediate complete denture: study protocol for the PANORAMIX randomized controlled trial

A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval

Author: A Gelli
A Morozov
A Sandelin
A Siepel
AK Jain
AP Gasch
C Csank
C Harbison
D Che
D Gordon
D Karolchik
D Martin
EP Xing
Ernest Fraenkel
G Stormo
G Thijs
G Yona
H Madhani
Hanah Margalit
IG Choi
J Hughes
J Lin
J Rutherford
J Schaber
J Zeitlinger
J Zhu
JL DeRisi
K MacIsaac
K MacIsaac
K Sjolander
M Bulyk
M Courel
M DeGroot
M Harris
M Kellis
MB Eisen
N Friedman
Naomi Habib
Nir Friedman
P Benos
PT Spellman
R Osada
S Aerts
S Chou
S Chou
S Gupta
S Mahony
S Mahony
S Pietrokovski
S Roepcke
T Bailey
T Kaplan
T Wang
TL Bailey
Tommy Kaplan
V Matys
W Day
X Liu
X Xie
Y Barash
Y Barash
Y Barash
Y Wang
Publication venue: Public Library of Science
Publication date: 01/02/2008
Field of study

Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors

CiteSeerX

Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison

Author: A Dembo
A Kryshtafovych
A Zemla
A Šali
AA Schaffer
AG Murzin
AY Mitrophanov
G Yona
H Cheng
J Söding
JC Wootton
JM Chandonia
L Holm
L Rychlewski
Mindaugas Margelevičius
MO Dayhoff
N Siew
PZ Kozbial
R Arratia
R Bundschuh
R Kolodny
R Sadreyev
RC Edgar
RI Sadreyev
RL Tatusov
S Henikoff
S Henikoff
S Karlin
S Karlin
S Pietrokovski
SF Altschul
SF Altschul
TF Smith
TT Lee
Y Qi
Y Wang
Y Zhang
Česlovas Venclovas
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Detection of common evolutionary origin (homology) is a primary means of inferring protein structure and function. At present, comparison of protein families represented as sequence profiles is arguably the most effective homology detection strategy. However, finding the best way to represent evolutionary information of a protein sequence family in the profile, to compare profiles and to estimate the biological significance of such comparisons, remains an active area of research. Results Here, we present a new homology detection method based on sequence profile-profile comparison. The method has a number of new features including position-dependent gap penalties and a global score system. Position-dependent gap penalties provide a more biologically relevant way to represent and align protein families as sequence profiles. The global score system enables an analytical solution of the statistical parameters needed to estimate the statistical significance of profile-profile similarities. The new method, together with other state-of-the-art profile-based methods (HHsearch, COMPASS and PSI-BLAST), is benchmarked in all-against-all comparison of a challenging set of SCOP domains that share at most 20% sequence identity. For benchmarking, we use a reference ("gold standard") free model-based evaluation framework. Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods. We also provide examples of the new method outperforming structure-based similarity detection and alignment. The implementation of the new method both as a standalone software package and as a web server is available at <url>http://www.ibt.lt/bioinformatics/coma</url>. Conclusion Due to a number of developments, the new profile-profile comparison method shows an improved ability to match distantly related protein domains. Therefore, the method should be useful for annotation and homology modeling of uncharacterized proteins.</p

Springer - Publisher Connector