Search CORE

189 research outputs found

Generalized affine gap costs for protein sequence alignment

Author: Stephen F Altschul
Publication venue
Publication date: 01/01/1998
Field of study

ABSTRACT Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment. Proteins 32:88-96, 1998. 1998 Wiley-Liss, Inc.

CiteSeerX

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches

Author: Alejandro A. Schäffer
Altschul
Altschul
Altschul
Altschul
Altschul
Altschul
Bailey
Berger
Brenner
Chandonia
Dembo
E. Michael Gertz
Eddy
Elston
Endres
Fisher
Green
Gribskov
Gumbel
Henikoff
Kann
Karlin
Karplus
Karplus
Lupas
McDonnell
Mott
Murzin
Pearson
Pearson
Richa Agarwala
Robinson
Rost
Schäffer
Schäffer
Sharon
Smith
Smith
Stephen F. Altschul
Sueoka
Wan
Wheeler
Wolf
Wootton
Yi-Kuo Yu
Yu
Yu
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set

CiteSeerX

Crossref

PubMed Central

PSI-BLAST pseudocounts and the minimum description length principle

Author: Alejandro A. Schäffer
Altschul
Altschul
Altschul
Bailey
Boeckmann
Brenner
Brown
Chandonia
Cover
Dayhoff
E. Michael Gertz
Eddy
Fisher
Gerstein
Gotoh
Gribskov
Grünwald
Grünwald
Henikoff
Henikoff
Henikoff
Karlin
Krogh
Lawrence
Murzin
Nishida
Richa Agarwala
Sander
Schwartz
Schäffer
Schäffer
Sibbald
Sjölander
Smith
Stephen F. Altschul
Tatusov
Thompson
Wheeler
Yi-Kuo Yu
Publication venue: Oxford University Press
Publication date
Field of study

Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default

Crossref

PubMed Central

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

Author: AA Schäffer
AL Delcher
Alejandro A Schäffer
B Brejová
B Hao
BG Barrell
DJ States
E Birney
E Birney
E Boy-Marcotte
E Boy-Marcotte
E Halperin
E Michael Gertz
EM Gertz
F Damak
F Zinoni
G Macino
H Peltola
IG Young
J Hein
J Hein
JC Wootton
L Knecht
M Gribskov
MS Boguski
MS Boguski
MS Gelfand
O Gotoh
P Steneberg
P Steneberg
R Durbin
Richa Agarwala
S Henikoff
S Kurtz
SA Chervitz
SC Low
SF Altschul
SF Altschul
SF Altschul
SF Altschul
Stephen F Altschul
TF Smith
W Gish
WJ Kent
WR Pearson
WR Pearson
WR Pearson
X Guan
X Huang
Yi-Kuo Yu
YK Yu
YK Yu
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparative analysis of ammonia monooxygenase (amoA) genes in the water column and sediment-water interface of two lakes and the Baltic Sea

Author: Aakra
Altschul
Avrahami
Beman
Bernhard
Cavari
Christofi
Coci
De Bie
Francis
Francis
Garland
Head
Hooper
Horz
Hughes
Ivanova
Jetten
Jiang
Johannes F. Imhoff
Johnstone
Junier
Juretschko
Karl-Paul Witzel
Kim
Koops
Kowalchuk
Kowalchuk
KÃ¶nneke
Mahmood
McCaig
McTavish
Molina
Nicolaisen
Nold
Norton
O'Mullan
Ok-Sun Kim
Phillips
Pilar Junier
Prosser
Purkhold
Purkhold
Robarts
Rotthauwe
Schloss
Speksnijder
Stephen
Stephen
Strous
Treusch
Tuomainen
Ward
Ward
Ward
Ward
Webster
Webster
Publication venue: 'Wiley'
Publication date: 01/01/2008
Field of study

The functional gene amoA was used to compare the diversity of ammonia-oxidizing bacteria (AOB) in the water column and sediment-water interface of the two freshwater lakes Plusssee and Schöhsee and the Baltic Sea. Nested amplifications were used to increase the sensitivity of amoA detection, and to amplify a 789-bp fragment from which clone libraries were prepared. The larger part of the sequences was only distantly related to any of the cultured AOB and is considered to represent new clusters of AOB within the Nitrosomonas/Nitrosospira group. Almost all sequences from the water column of the Baltic Sea and from 1-m depth of Schöhsee were related to different Nitrosospira clusters 0 and 2, respectively. The majority of sequences from Plusssee and Schöhsee were associated with sequences from Chesapeake Bay, from a previous study of Plusssee and from rice roots in Nitrosospira-like cluster A, which lacks sequences from Baltic Sea. Two groups of sequences from Baltic Sea sediment were related to clonal sequences from other brackish/marine habitats in the purely environmental Nitrosospira-like cluster B and the Nitrosomonas-like cluster. This confirms previous results from 16S rRNA gene libraries that indicated the existence of hitherto uncultivated AOB in lake and Baltic Sea samples, and showed a differential distribution of AOB along the water column and sediment of these environment

OceanRep

Infoscience - École polytechnique fédérale de Lausanne

Crossref

MPG.PuRe

Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

Author: A Bairoch
A Christoffels
A Gurevich
A Kozomara
A McKenna
A Mitchell
A Morgulis
A Morgulis
A Pradhan
A Reiner
A Rodriguez-Mari
A Stamatakis
A Yates
AI Makunin
AJ Enright
AL Price
AL Price
Alan Christoffels
Aleksey Komissarov
Alexey Tupikin
Amy Hin Yan Tong
Andrey A. Yurchenko
AR Quinlan
B Langmead
B Star
C Berthelot
C Camacho
C Holt
C Wang
Chen-Shan Chin
CS Chin
D Brawand
D Ellinghaus
DA Benson
Darrell Green
DC Hardie
Dean R. Jerry
DH Alexander
Doreen Lau
DR Kelley
DRS-K C. Jerry
E Casacuberta
E. TG Staristina
EW Myers
F Abascal
F Chen
F Yang
FC Jones
FJ Krsticevic
Fritz J. Sedlazeck
G Abrusan
G Benson
G Lin
G Marcais
G Parra
G Parra
G Tamazian
GH Yue
GH Yue
Gopikrishna Gopalapillai
Gregory W. Vurture
GS Slater
GT Valente
H Li
H Saiga
Heiner Kuhl
HH Kazazian Jr.
I Braasch
Inna S. Kuznetsova
IS Kuznetsova
J Castresana
J Eid
J Huerta-Cepas
J Jurka
J Lin
James P. Drake
JG Ruby
JN Volff
JN Volff
Jolly M. Saju
Jonas Korlach
JS Chew
Junhui Jiang
K Howe
K Katoh
K Prufer
Kathiresan Purushothaman
KD Pruitt
KJ Hoff
KP Koepfli
KW Tzung
Lawrence S. Hon
László Orbán
M Blanchette
M Kanehisa
M Kasahara
M Kolmogorov
M Krzywinski
M Martin
M Schartl
M Tarailoâ-Graovac
M Tine
MA Larkin
Mario Jonas
Marsel Kabilov
Matthew Boitano
MB Stocks
MG Grabherr
Michael C. Schatz
MJ Chaisson
MR Friedlander
N Siegel
Natascha M. Thevasagayam
NM Thevasagayam
O Jaillon
O Otero
P Cingolani
P Ravi
P Schattner
P Shannon
P Xu
Paul M. Richardson
PE Warburton
Peter Van Heusden
R Kajitani
R Lorenz
R Luo
R Moore
R Pethiyagoda
R Poulter
R She
R Sreenivasan
Ramkumar Lachumanan
RD Ward
RD Ward
Richard Hall
RJ Roberts
S Chen
S Guindon
S Hoegg
S Hoegg
S Koren
S Vij
S Zhou
Sai Rama Sridatta Prakki
Sarah Mwangi
SF Altschul
Shubha Vij
Si Lok
Si Yan Ngoh
Siddharth Singh
Simon Moxon
SM Kielbasa
Sridhar Sivasubbu
Stanley Kimbung Mbandi
Stephen J. O'Brien
Stephen W. Turner
T Anantharaman
Tamás Dalmay
Tansyn H. Noble
TD Wu
TF DeLuca
TH O'Hare
TLO Davis
TS Anantharaman
Tyler Garvin
U Consortium
U Grimholt
V Douard
V Ravi
Vinaya Kumar Katneni
Vinod Scaria
Vladimir Trifonov
W Xue
WC Liew
Woei Chang Liew
WS Davidson
X Huang
X Zheng
XG Wang
XG Wang
Xueyan Shen
Y Guiguen
Y Han
Y Hashiguchi
Y Moriya
Y Sato
Y Sato
Y Sato
Z Lai
Ø Hammer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Research Repository

Repository of the Academy's Library

University of East Anglia digital repository

NSU Works

MPG.PuRe

Specificity of DNA-binding by the FAX-1 and NHR-67 nuclear receptors of Caenorhabditis elegans is partially mediated via a subclass-specific P-box residue

Author: AE Sluder
AP Monaghan
B Rost
B Wightman
B Wightman
Bruce Wightman
CR Gissendanner
Danielle R Snowflack
DG Higgins
DG Kneller
DJ Mangelsdorf
E Zanaria
Eric L Smith
F Chen
F Pignoni
F Rastinejad
FM Ausubel
H Cheng
HM Reichardt
J Chen
J Zilliacus
JH Miller
JL Pitman
JW Much
K Umesono
KD Finley
Kristy Reinert
M Danielsen
M Kobayashi
M Robinson-Rechavi
M Van Gilst
M Wiens
Melissa Cronin
NB Haider
P Sengupta
Rebecca M Lombel
RT Yu
S Bertrand
S Khorasanizadeh
S Mader
SF Altschul
Sheila Clever
Stephen D DeMeo
TE Wilson
TE Wilson
TR Strecker
U Nauber
V Laudet
W Seol
Y Shi
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The nuclear receptors of the NR2E class play important roles in pattern formation and nervous system development. Based on a phylogenetic analysis of DNA-binding domains, we define two conserved groups of orthologous NR2E genes: the NR2E1 subclass, which includes <it>C. elegans nhr-67, Drosophila tailless </it>and <it>dissatisfaction</it>, and vertebrate Tlx (NR2E2, NR2E4, NR2E1), and the NR2E3 subclass, which includes <it>C. elegans fax-1 </it>and vertebrate PNR (NR2E5, NR2E3). PNR and Tll nuclear receptors have been shown to bind the hexamer half-site AAGTCA, instead of the hexamer AGGTCA recognized by most other nuclear receptors, suggesting unique DNA-binding properties for NR2E class members. Results We show that NR2E3 subclass member FAX-1, unlike NHR-67 and other NR2E1 subclass members, binds to hexamer half-sites with relaxed specificity: it will bind hexamers with the sequence ANGTCA, although it prefers a purine to a pyrimidine at the second position. We use site-directed mutagenesis to demonstrate that the difference between FAX-1 and NHR-67 binding preference is partially mediated by a conserved subclass-specific asparagine or aspartate residue at position 19 of the DNA-binding domain. This amino acid position is part of the "P box" that plays a critical role in defining binding site specificity and has been shown to make hydrogen-bond contacts to the second position of the hexamer in co-crystal structures for other nuclear receptors. The relaxed specificity allows FAX-1 to bind a much larger repertoire of half-sites than NHR-67. While NR2E1 class proteins bind both monomeric and dimeric sites, the NR2E3 class proteins bind only dimeric sites. The presence of a single strong site adjacent to a very weak site allows dimeric FAX-1 binding, further increasing the number of dimeric binding sites to which FAX-1 may bind <it>in vivo</it>. Conclusion These findings identify subclass-specific DNA-binding specificities and dimerization properties for the NR2E1 and NR2E3 subclasses. For the NR2E1 protein NHR-67, Asp-19 permits binding to AAGTCA half-sites, while Asn-19 permits binding to AGGTCA half-sites. The apparent conservation of DNA-binding properties between vertebrate and nematode NR2E receptors allows for the possibility of evolutionarily-conserved regulatory patterns.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Superinfection exclusion and the long-term survival of honey bees in Varroa-infested colonies

Author: AC Baker
AC Highfield
AS Lauring
C Yue
CE Thompson
D DeJong
D Sims
David Dixon
DC Schroeder
Declan C Schroeder
DJT Sumpter
DM Tscherne
E Domingo
EV Ryabov
F Mondet
FLW Ratnieks
G Di Prisco
G Lanzi
Gideon J Mordecai
Ian M Jones
J Moore
JR de Miranda
JR Ongus
KA Frazer
Laura E Brettell
N Gallai
N Zioni
O Berényi
RF Lee
RN Salaman
S Lole
SF Altschul
SJ Labrie
SJ Martin
Stephen J Martin
T Fujiyuki
TD Seeley
U Strauss
W Hunter
X Yang
Y-M Lee
YL Conte
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2015
Field of study

Over the past 50 years, many millions of European honey bee (Apis mellifera) colonies have died as the ectoparasitic mite, Varroa destructor, has spread around the world. Subsequent studies have indicated that the mite's association with a group of RNA viral pathogens (Deformed Wing Virus, DWV) correlates with colony death. Here, we propose a phenomenon known as superinfection exclusion that provides an explanation of how certain A. mellifera populations have survived, despite Varroa infestation and high DWV loads. Next-generation sequencing has shown that a non-lethal DWV variant 'type B' has become established in these colonies and that the lethal 'type A' DWV variant fails to persist in the bee population. We propose that this novel stable host-pathogen relationship prevents the accumulation of lethal variants, suggesting that this interaction could be exploited for the development of an effective treatment that minimises colony losses in the future.The ISME Journal advance online publication, 27 October 2015; doi:10.1038/ismej.2015.186

Central Archive at the University of Reading

University of Salford Institutional Repository

Crossref

Plymouth Marine Science Electronic Archive (PlyMSEA)

PubMed Central

Western Sydney ResearchDirect

Arsenic resistance in the archaeon "Ferroplasma acidarmanus" : new insights into the structure and evolution of the ars genes

Author: Altschul
Bhattacharjee
Butcher
Carlin
Cervantes
Chen
Chen
Dagnac
Diorio
Edwards
Edwards
Gladysheva
Jillian F. Banfield
Klaue
Li
Li
McGuire
Ng
Philip L. Bond
Rensing
Rosen
Saha
Shi
Silver
Smith
Smith
Stephen C. Peters
Thomas M. Gihring
Wei
Wu
Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Arsenic resistance in the acidophilic iron-oxidizing archaeon " Ferroplasma acidarmanus " was investigated. F. acidarmanus is native to arsenic-rich environments, and culturing experiments confirm a high level of resistance to both arsenite and arsenate. Analyses of the complete genome revealed protein-encoding regions related to known arsenic-resistance genes. Genes encoding for ArsR (arsenite-sensitive regulator) and ArsB (arsenite-efflux pump) homologues were found located on a single operon. A gene encoding for an ArsA relative (anion-translocating ATPase) located apart from the arsRB operon was also identified. Arsenate-resistance genes encoding for proteins homologous to the arsenate reductase ArsC and the phosphate-specific transporter Pst were not found, indicating that additional unknown arsenic-resistance genes exist for arsenate tolerance. Phylogenetic analyses of ArsA-related proteins suggest separate evolutionary lines for these proteins and offer new insights into the formation of the arsA gene. The ArsB-homologous protein of F. acidarmanus had a high degree of similarity to known ArsB proteins. An evolutionary analysis of ArsB homologues across a number of species indicated a clear relationship in close agreement with 16S rRNA evolutionary lines. These results support a hypothesis of arsenic resistance developing early in the evolution of life.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42444/1/s00792-002-0303-6.pd

Crossref

University of East Anglia digital repository

Deep Blue Documents at the University of Michigan

University of Queensland eSpace