Search CORE

2,232 research outputs found

Distances and classification of amino acids for different protein secondary structures

Author: J. Garnier
Li-mei Zhang
O. Weiss
O. Weiss
P. Stolorz
S. Henikoff
Shan Guan
U. Hobohm
W. Kabsch
Wei-Mou Zheng
Xin Liu
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2003
Field of study

Window profiles of amino acids in protein sequences are taken as a description of the amino acid environment. The relative entropy or Kullback-Leibler distance derived from profiles is used as a measure of dissimilarity for comparison of amino acids and secondary structure conformations. Distance matrices of amino acid pairs at different conformations are obtained, which display a non-negligible dependence of amino acid similarity on conformations. Based on the conformation specific distances clustering analysis for amino acids is conducted.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

High-throughput discovery of rare human nucleotide polymorphisms by Ecotilling

Author: Bowers Elisabeth
Comai Luca
Greene Elizabeth A.
Henikoff Steven
Till Bradley J.
Zerr Troy
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Human individuals differ from one another at only ∼0.1% of nucleotide positions, but these single nucleotide differences account for most heritable phenotypic variation. Large-scale efforts to discover and genotype human variation have been limited to common polymorphisms. However, these efforts overlook rare nucleotide changes that may contribute to phenotypic diversity and genetic disorders, including cancer. Thus, there is an increasing need for high-throughput methods to robustly detect rare nucleotide differences. Toward this end, we have adapted the mismatch discovery method known as Ecotilling for the discovery of human single nucleotide polymorphisms. To increase throughput and reduce costs, we developed a universal primer strategy and implemented algorithms for automated band detection. Ecotilling was validated by screening 90 human DNA samples for nucleotide changes in 5 gene targets and by comparing results to public resequencing data. To increase throughput for discovery of rare alleles, we pooled samples 8-fold and found Ecotilling to be efficient relative to resequencing, with a false negative rate of 5% and a false discovery rate of 4%. We identified 28 new rare alleles, including some that are predicted to damage protein function. The detection of rare damaging mutations has implications for models of human disease

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

A methodology for determining amino-acid substitution matrices from set covers

Author: A. Bahr
A.D. McLachlan
D.F. Feng
G. Vogt
G.H. Gonnet
J. Setubal
J.D. Blake
J.K.M. Rao
M. Gribskov
M.F. Sagot
R.B. Russell
R.E. Green
R.F. Smith
S. Henikoff
S.A. Benner
T. Müller
T.P. Li
W.S.J. Valdar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2005
Field of study

We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

arXiv.org e-Print Archive

Crossref

Simplified amino acid alphabets based on deviation of conditional probability from random background

Author: A. Godzik
A.G. Murzin
C.E. Schafmeister
D.S. Riddle
Di Liu
H.S. Chan
J. Wang
Ji Qi
K.W. Plaxco
L.R. Murphy
M. Munson
S. Henikoff
S. Miyazawa
S.E. Brenner
S.F. Altschul
S.F. Altschul
Wei-Mou Zheng
Xin Liu
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2002
Field of study

The primitive data for deducing the Miyazawa-Jernigan contact energy or BLOSUM score matrix consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such conditional probability from random background, a scheme for reduction of amino acid alphabet is proposed. It is observed that evident discrepancy exists between reduced alphabets obtained from raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.Comment: 9 pages,3figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Discovery of chemically induced mutations in rice by TILLING

Author: Colowit Peter
Comai Luca
Cooper Jennifer
Greene Elizabeth A
Henikoff Steven
Tai Thomas H
Till Bradley J
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Rice is both a food source for a majority of the world's population and an important model system. Available functional genomics resources include targeted insertion mutagenesis and transgenic tools. While these can be powerful, a non-transgenic, unbiased targeted mutagenesis method that can generate a range of allele types would add considerably to the analysis of the rice genome. TILLING (Targeting Induced Local Lesions in Genomes), a general reverse genetic technique that combines traditional mutagenesis with high throughput methods for mutation discovery, is such a method. RESULTS: To apply TILLING to rice, we developed two mutagenized rice populations. One population was developed by treatment with the chemical mutagen ethyl methanesulphonate (EMS), and the other with a combination of sodium azide plus methyl-nitrosourea (Az-MNU). To find induced mutations, target regions of 0.7–1.5 kilobases were PCR amplified using gene specific primers labeled with fluorescent dyes. Heteroduplexes were formed through denaturation and annealing of PCR products, mismatches digested with a crude preparation of CEL I nuclease and cleaved fragments visualized using denaturing polyacrylamide gel electrophoresis. In 10 target genes screened, we identified 27 nucleotide changes in the EMS-treated population and 30 in the Az-MNU population. CONCLUSION: We estimate that the density of induced mutations is two- to threefold higher than previously reported rice populations (about 1/300 kb). By comparison to other plants used in public TILLING services, we conclude that the populations described here would be suitable for use in a large scale TILLING project

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

VENN, a tool for titrating sequence conservation onto protein structures

Author: Altschul
Berman
Didier
Fujii
Henikoff
Holm
J. Vyas
Kopp
Landau
M. R. Gryk
M. R. Schiller
Pettersen
Pieper
Schwede
Wu
Publication venue: Oxford University Press
Publication date: 01/10/2009
Field of study

Residue conservation is an important, established method for inferring protein function, modularity and specificity. It is important to recognize that it is the 3D spatial orientation of residues that drives sequence conservation. Considering this, we have built a new computational tool, VENN that allows researchers to interactively and graphically titrate sequence homology onto surface representations of protein structures. Our proposed titration strategies reveal critical details that are not readily identified using other existing tools. Analyses of a bZIP transcription factor and receptor recognition of Fibroblast Growth Factor using VENN revealed key specificity determinants. Weblink: http://sbtools.uchc.edu/venn/

Crossref

PubMed Central

University of Nevada, Las Vegas Repository

Optimal neighborhood indexing for protein similarity search

Author: D Lipman
D Lipman
DG Brown
Dominique Lavenier
Gregory Kucherov
J Henikoff
JL Hennessy
L Murphy
L Noé
Laurent Noé
M Crochemore
M Li
M Roytberg
Mathieu Giraud
MP Styczynski
N Cannata
P Peterlongo
Pierre Peterlongo
R Edgar
S Altschul
S Altschul
S Henikoff
S Henikoff
S Karlin
T Li
Van Hoa Nguyen
VH Nguyen
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet.\ud \ud Results: The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum.\ud \ud Conclusions: We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL-Rennes 1

Towards Reliable Automatic Protein Structure Alignment

Author: A. Caprara
A. Zemla
A.G. Murzin
A.S. Konagurthu
C.A. Rohl
C.B. Do
G. Lancia
H.M. Berman
I.N. Shindyalov
J. Shi
J. Xu
J.F. Gibrat
K. Mizuguchi
L. Kinch
L. Xie
M. Comin
M. Levitt
M. Moakher
M. Sadowski
N.M. Daniels
N.N. Alexandrov
S. Henikoff
S. Subbiah
S.B. Needleman
S.B. Pandit
S.R. Eddy
W. Pirovano
Y. Yang
Y. Ye
Y. Zhang
Y. Zhang
Y. Zhang
Publication venue
Publication date: 01/01/2013
Field of study

A variety of methods have been proposed for structure similarity calculation, which are called structure alignment or superposition. One major shortcoming in current structure alignment algorithms is in their inherent design, which is based on local structure similarity. In this work, we propose a method to incorporate global information in obtaining optimal alignments and superpositions. Our method, when applied to optimizing the TM-score and the GDT score, produces significantly better results than current state-of-the-art protein structure alignment tools. Specifically, if the highest TM-score found by TMalign is lower than (0.6) and the highest TM-score found by one of the tested methods is higher than (0.5), there is a probability of (42%) that TMalign failed to find TM-scores higher than (0.5), while the same probability is reduced to (2%) if our method is used. This could significantly improve the accuracy of fold detection if the cutoff TM-score of (0.5) is used. In addition, existing structure alignment algorithms focus on structure similarity alone and simply ignore other important similarities, such as sequence similarity. Our approach has the capacity to incorporate multiple similarities into the scoring function. Results show that sequence similarity aids in finding high quality protein structure alignments that are more consistent with eye-examined alignments in HOMSTRAD. Even when structure similarity itself fails to find alignments with any consistency with eye-examined alignments, our method remains capable of finding alignments highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Crossref

The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment

Author: Altschul
Altschul
Altschul
Bundschuh
Collins
Gotoh
Henikoff
J. L. Spouge
Karlin
Mott
Mott
Mott
Needleman
Robinson
S. Sheetlin
Smith
Smith
Storey
Waterman
Y. Park
Yu
Publication venue: Oxford University Press
Publication date: 06/09/2005
Field of study

The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor

Crossref

PubMed Central

Candida albicans repetitive elements display epigenetic diversity and plasticity

Author: A Ellahi
A Pidoux
A Selmecki
aF Straight
BD Strahl
C Ketel
C Li
C Trapnell
Ca Froyd
CJ Merrick
D Kadosh
DE Gottschling
GD Shankaranarayana
H Chibana
H Chibana
J Haran
J Huang
J Huang
J Nakayama
J Pérez-Martín
J Wendland
JC Tanny
JS Smith
Ka Morano
KR Hansen
L Vasiljeva
LH Freitas-Junior
LN Rusche
M Bryk
M Bühler
M Dubarry
M Paschini
M Van het Hoog
MA Pfaller
MD De Backer
MJ McEachern
N Saksouk
PA Dumesic
PR Lephart
PR Lephart
R Kaur
RB Wilson
RJ Bennett
S Greiss
S Henikoff
S Imai
S Kueng
S Rea
SI Iwaguchi
T Jones
T Kobayashi
T Kobayashi
T Kouzarides
VM Bruno
W Shou
WS Chu
X Bi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/03/2016
Field of study

Transcriptionally silent heterochromatin is associated with repetitive DNA. It is poorly understood whether and how heterochromatin differs between different organisms and whether its structure can be remodelled in response to environmental signals. Here, we address this question by analysing the chromatin state associated with DNA repeats in the human fungal pathogen Candida albicans. Our analyses indicate that, contrary to model systems, each type of repetitive element is assembled into a distinct chromatin state. Classical Sir2-dependent hypoacetylated and hypomethylated chromatin is associated with the rDNA locus while telomeric regions are assembled into a weak heterochromatin that is only mildly hypoacetylated and hypomethylated. Major Repeat Sequences, a class of tandem repeats, are assembled into an intermediate chromatin state bearing features of both euchromatin and heterochromatin. Marker gene silencing assays and genome-wide RNA sequencing reveals that C. albicans heterochromatin represses expression of repeat-associated coding and non-coding RNAs. We find that telomeric heterochromatin is dynamic and remodelled upon an environmental change. Weak heterochromatin is associated with telomeres at 30?°C, while robust heterochromatin is assembled over these regions at 39?°C, a temperature mimicking moderate fever in the host. Thus in C. albicans, differential chromatin states controls gene expression and epigenetic plasticity is linked to adaptation

Crossref

PubMed Central

Kent Academic Repository