Search CORE

3 research outputs found

The effectiveness of position- and composition-specific gap costs for protein similarity searches

Author: A. Stojmirovic
Barrett
Benner
Chandonia
Chang
E. M. Gertz
Eddy
Finn
Gotoh
Gough
Gribskov
Gribskov
Hajian-Tilaki
Hanley
Henikoff
Hughey
Krogh
Madera
Murzin
Pascarella
Qiu
Reese
S. F. Altschul
Schaffer
Smith
Vinga
Wistrand
Wrabl
Y.-K. Yu
Yu
Yu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/01/2008
Field of study

The flexibility in gap cost enjoyed by Hidden Markov Models (HMMs) is expected to afford them better retrieval accuracy than position-specific scoring matrices (PSSMs). We attempt to quantify the effect of more general gap parameters by separately examining the influence of position- and composition-specific gap scores, as well as by comparing the retrieval accuracy of the PSSMs constructed using an iterative procedure to that of the HMMs provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments. We found that position-specific gap penalties have an advantage over uniform gap costs. We did not explore optimizing distinct uniform gap costs for each query. For Pfam, PSSMs iteratively constructed from seeds based on HMM consensus sequences perform equivalently to HMMs that were adjusted to have constant gap transition probabilities, albeit with much greater variance. We observed no effect of composition-specific gap costs on retrieval performance.Comment: 17 pages, 4 figures, 2 table

arXiv.org e-Print Archive

Crossref

PubMed Central

Identification, Characterization, and Life Cycle of Intein-Associated Homing Endonucleases

Author: Skydel Joshua J.
Publication venue: OpenCommons@UConn
Publication date: 08/06/2016
Field of study

Inteins are molecular parasites that have been identified in unicellular organisms from the three domains of life. The intein self-excises following translation of the host gene, and therefore incurs a fitness cost for its carrier. The symbiotic state of the intein to its host is dependent on the presence or absence of a homing endonuclease domain, which facilitates horizontal transfer of the molecule. Identification of this domain provides information on the evolutionary history of the intein, as well as patterns of horizontal gene transfer in microbial communities. I have therefore developed Hidden Markov Models (HMMs) to identify homing endonuclease domains in biological sequence data. Following validation, the HMMs were used to assign symbiotic states to inteins found in the haloarchaea. This search method expands upon previous approaches to characterizing inteins, and provides molecular evidence for the presence of homing endonuclease domains. I have also created an agent-based model for the competition between intein states in a simulated microbial population. The model incorporates spatial interactions, measured efficiencies of gene transfer, and environmental perturbations to determine the conditions under which inteins spread. These simulations determined that inteins actively spread in a population that is in stationary growth phase, while carriers are outcompeted during exponential phases of growth. My computational analysis provides a new method for assessing the symbiotic state of inteins, as well as a platform for exploring the life cycle of inteins under a variety of environmental scenarios

DigitalCommons@UConn

OpenCommons at University of Connecticut

More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology

Author: A Andreeva
A Bahr
A Bateman
A Bateman
A Bernsel
A Kihara
A Klug
A Marchler-Bauer
A Stojmirovic
AA Schaffer
AA Schaffer
AE Todd
AG Murzin
AL Cuff
AM Schnoes
AM Settles
B Eisenhaber
B Eisenhaber
B Eisenhaber
B Eisenhaber
B Scheres
C Bru
C Sander
C Xu
CA Ouzounis
CH Wu
CP Ponting
CP Ponting
CP Ponting
D Devos
D Ivanov
D Wilson
DA Uwanogho
DE de Oliveira
DL Burgess
E Portugaly
EL Sonnhammer
EL Sonnhammer
F Eisenhaber
F Eisenhaber
F Eisenhaber
Frank Eisenhaber
G Schneider
GC Clark
GE Tusnady
H Ashida
H Johansson
H Mi
H Nielsen
HS Ooi
I Letunic
IL Alberts
J Abendroth
J Gough
J Kota
J Ren
J Schultz
J Schultz
JC McNulty
JC Pizarro
JC Wootton
JD Bendtsen
JD Selengut
JG Henikoff
JH Weiner
JH Zar
JI Shin
JK Tie
L Aravind
L Kall
L Kall
L Sun
L Zhang
LF Ciufo
LJ Smith
M Cserzo
M Cserzo
M Fukuda
M Gruber
M Hedman
M Ikeda
MH Saier Jr
MR Yen
N Hulo
N Kageyama-Yahara
O Leon
P Bork
P Bork
P Bork
P Tompa
P Tompa
PH Krebsbach
Philip E. Bourne
R Albrecht
R Durbin
R Janssen
R Watanabe
RD Finn
RF Doolittle
RR Copley
RW Hooft
S Henikoff
S Iuchi
S Ohnishi
S Veretnik
SA Weston
Sebastian Maurer-Stroh
SF Altschul
SF Altschul
SJ Sammut
SR Eddy
SR Eddy
SS Krishna
T Nakai
TA Holland
TK Attwood
V Anantharaman
V Brendel
VV Lunin
W Li
W Verelst
Wing-Cheong Wong
WR Gilks
WR Gilks
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Large-scale genome sequencing gained general importance for life science because functional annotation of otherwise experimentally uncharacterized sequences is made possible by the theory of biomolecular sequence homology. Historically, the paradigm of similarity of protein sequences implying common structure, function and ancestry was generalized based on studies of globular domains. Having the same fold imposes strict conditions over the packing in the hydrophobic core requiring similarity of hydrophobic patterns. The implications of sequence similarity among non-globular protein segments have not been studied to the same extent; nevertheless, homology considerations are silently extended for them. This appears especially detrimental in the case of transmembrane helices (TMs) and signal peptides (SPs) where sequence similarity is necessarily a consequence of physical requirements rather than common ancestry. Thus, matching of SPs/TMs creates the illusion of matching hydrophobic cores. Therefore, inclusion of SPs/TMs into domain models can give rise to wrong annotations. More than 1001 domains among the 10,340 models of Pfam release 23 and 18 domains of SMART version 6 (out of 809) contain SP/TM regions. As expected, fragment-mode HMM searches generate promiscuous hits limited to solely the SP/TM part among clearly unrelated proteins. More worryingly, we show explicit examples that the scores of clearly false-positive hits, even in global-mode searches, can be elevated into the significance range just by matching the hydrophobic runs. In the PIR iProClass database v3.74 using conservative criteria, we find that at least between 2.1% and 13.6% of its annotated Pfam hits appear unjustified for a set of validated domain models. Thus, false-positive domain hits enforced by SP/TM regions can lead to dramatic annotation errors where the hit has nothing in common with the problematic domain model except the SP/TM region itself. We suggest a workflow of flagging problematic hits arising from SP/TM-containing models for critical reconsideration by annotation users

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS