Search CORE

5,703 research outputs found

An Alternative Model of Amino Acid Replacement

Author: Altschul
Altschul
Bennet
Brenner
Brenner
Bruno
Crooks
Crooks
Felsenstein
Goldman
Goldman
Gonnet
Henikoff
Henikoff
Jones
Koshi
Marti-Renom
Muller
Murzin
M ller
Park
Schneider
Sjolander
Smith
Thorne
Topham
Yona
Zachariah
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2004
Field of study

The observed correlations between pairs of homologous protein sequences are typically explained in terms of a Markovian dynamic of amino acid substitution. This model assumes that every location on the protein sequence has the same background distribution of amino acids, an assumption that is incompatible with the observed heterogeneity of protein amino acid profiles and with the success of profile multiple sequence alignment. We propose an alternative model of amino acid replacement during protein evolution based upon the assumption that the variation of the amino acid background distribution from one residue to the next is sufficient to explain the observed sequence correlations of homologs. The resulting dynamical model of independent replacements drawn from heterogeneous backgrounds is simple and consistent, and provides a unified homology match score for sequence-sequence, sequence-profile and profile-profile alignment.Comment: Minor improvements. Added figure and reference

arXiv.org e-Print Archive

CiteSeerX

Crossref

Epigenetics & chromatin: Interactions and processes

Author: Grosveld F.G. (Frank)
Henikoff S. (Steven)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

On 11 to 13 March 2013, BioMed Central will be hosting its inaugural conference, Epigenetics & Chromatin: Interactions and Processes, at Harvard Medical School, Cambridge, MA, USA. Epigenetics & Chromatin has now launched a special article series based on the general themes of the conference

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

EUR Research Repository

PubMed Central

Erasmus University Digital Repository

Pairwise alignment incorporating dipeptide covariation

Author: Altschul
Altschul
Altschul
Altschul
Bailey
Bishop
Brenner
Cline
Crooks
DOOLITTLE
Frith
Fukami-Kobayashi
G. E. Crooks
Goldman
Gonnet
Henikoff
Henikoff
Jung
Karplus
Lin
Muller
Murzin
Park
Pearson
R. E. Green
RODIONOV
S. E. Brenner
Sander
Smith
Thorne
Thorne
Thorne
Topham
Weiss
Zachariah
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/07/2005
Field of study

Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation

arXiv.org e-Print Archive

Crossref

Methylation-Sensitive Expression of a DNA Demethylase Gene Serves As an Epigenetic Rheostat

Author: Gehring Mary
Henikoff Steven
Pignatta Daniela
Williams Ben P.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/12/2014
Field of study

Genomes must balance active suppression of transposable elements (TEs) with the need to maintain gene expression. In Arabidopsis, euchromatic TEs are targeted by RNA-directed DNA methylation (RdDM). Conversely, active DNA demethylation prevents accumulation of methylation at genes proximal to these TEs. It is unknown how a cellular balance between methylation and demethylation activities is achieved. Here we show that both RdDM and DNA demethylation are highly active at a TE proximal to the major DNA demethylase gene ROS1. Unexpectedly, and in contrast to most other genomic targets, expression of ROS1 is promoted by DNA methylation and antagonized by DNA demethylation. We demonstrate that inducing methylation in the ROS1 proximal region is sufficient to restore ROS1 expression in an RdDM mutant. Additionally, methylation-sensitive expression of ROS1 is conserved in other species, suggesting it is adaptive. We propose that the ROS1 locus functions as an epigenetic rheostat, tuning the level of demethylase activity in response to methylation alterations, thus ensuring epigenomic stability.Pew Charitable Trusts (Biomedical Scholars Award)Alexander and Margaret Stewart Trust (Scholars Award

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Distances and classification of amino acids for different protein secondary structures

Author: J. Garnier
Li-mei Zhang
O. Weiss
O. Weiss
P. Stolorz
S. Henikoff
Shan Guan
U. Hobohm
W. Kabsch
Wei-Mou Zheng
Xin Liu
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2003
Field of study

Window profiles of amino acids in protein sequences are taken as a description of the amino acid environment. The relative entropy or Kullback-Leibler distance derived from profiles is used as a measure of dissimilarity for comparison of amino acids and secondary structure conformations. Distance matrices of amino acid pairs at different conformations are obtained, which display a non-negligible dependence of amino acid similarity on conformations. Based on the conformation specific distances clustering analysis for amino acids is conducted.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

How should novelty be valued in science?

Author: Alberts
Baker
Berget
Collins
Cook
Doyle
Errington
Fortin
Friedman
Gallo
Godfrey-Smith
Hall
Henikoff
Higginson
Hull
Kitcher
Kuhn
Lakatos
Lander
Laudan
Lauer
Lee
McClintock
Merton
Nüsslein-Volhard
Popper
Smith
Stent
Strevens
Venter
Watson
Wightman
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Scientists are under increasing pressure to do "novel" research. Here I explore whether there are risks to overemphasizing novelty when deciding what constitutes good science. I review studies from the philosophy of science to help understand how important an explicit emphasis on novelty might be for scientific progress. I also review studies from the sociology of science to anticipate how emphasizing novelty might impact the structure and function of the scientific community. I conclude that placing too much value on novelty could have counterproductive effects on both the rate of progress in science and the organization of the scientific community. I finish by recommending that our current emphasis on novelty be replaced by a renewed emphasis on predictive power as a characteristic of good science.</jats:p

Crossref

Digital Commons@Becker

Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

Author: Altschul
Andreatta
Colaert
Crooks
Fujii
Henikoff
Henikoff
Hobohm
Kullback
Martin Christen Frølund Thomsen
Morten Nielsen
Nielsen
Porter
Rammensee
Schneider
Shannon
Vacic
Workman
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed)

Crossref

PubMed Central

Online Research Database In Technology

Convolutional LSTM Networks for Subcellular Localization of Proteins

Author: A Graves
A Höglund
A Prlić
C Magnan
G Dahl
HY Xiong
LJP Maaten Van Der
M Schuster
MCF Thomsen
O Emanuelsson
P Baldi
P Lena Di
S Briesemeister
S Henikoff
S Hochreiter
SF Altschul
T Blum
T Goldberg
T Petersen
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Online Research Database In Technology

Recommended from our members

High-resolution mapping of transcription factor binding sites on native chromatin

Author: Ahmad Kami
Henikoff Steven
Kasinathan Sivakanthan
Orsi Guillermo A.
Zentner Gabriel E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2014
Field of study

Sequence-specific DNA-binding proteins including transcription factors (TFs) are key determinants of gene regulation and chromatin architecture. Formaldehyde cross-linking and sonication followed by Chromatin ImmunoPrecipitation (X-ChIP) is widely used for profiling of TF binding, but is limited by low resolution and poor specificity and sensitivity. We present a simple protocol that starts with micrococcal nuclease-digested uncross-linked chromatin and is followed by affinity purification of TFs and paired-end sequencing. The resulting ORGANIC (Occupied Regions of Genomes from Affinity-purified Naturally Isolated Chromatin) profiles of Saccharomyces cerevisiae Abf1 and Reb1 provide highly accurate base-pair resolution maps that are not biased toward accessible chromatin, and do not require input normalization. We also demonstrate the high specificity of our method when applied to larger genomes by profiling Drosophila melanogaster GAGA Factor and Pipsqueak. Our results suggest that ORGANIC profiling is a widely applicable high-resolution method for sensitive and specific profiling of direct protein-DNA interactions

Harvard University - DASH

Multiple sequence alignment based on set covers

Author: A. Bahr
B. Manthey
B. Morgenstern
B. Morgenstern
C. Notredame
D. Gusfield
G. Vogt
J.D. Thompson
K. Katoh
O. Gotoh
P. Zhao
R.E. Green
R.F. Smith
S. Henikoff
T. Müller
T.P. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

arXiv.org e-Print Archive

CiteSeerX

Crossref