Search CORE

2,579 research outputs found

Epigenetics & chromatin: Interactions and processes

Author: Grosveld F.G. (Frank)
Henikoff S. (Steven)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

On 11 to 13 March 2013, BioMed Central will be hosting its inaugural conference, Epigenetics & Chromatin: Interactions and Processes, at Harvard Medical School, Cambridge, MA, USA. Epigenetics & Chromatin has now launched a special article series based on the general themes of the conference

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

EUR Research Repository

PubMed Central

Erasmus University Digital Repository

Pairwise alignment incorporating dipeptide covariation

Author: Altschul
Altschul
Altschul
Altschul
Bailey
Bishop
Brenner
Cline
Crooks
DOOLITTLE
Frith
Fukami-Kobayashi
G. E. Crooks
Goldman
Gonnet
Henikoff
Henikoff
Jung
Karplus
Lin
Muller
Murzin
Park
Pearson
R. E. Green
RODIONOV
S. E. Brenner
Sander
Smith
Thorne
Thorne
Thorne
Topham
Weiss
Zachariah
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/07/2005
Field of study

Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation

arXiv.org e-Print Archive

Crossref

Distances and classification of amino acids for different protein secondary structures

Author: J. Garnier
Li-mei Zhang
O. Weiss
O. Weiss
P. Stolorz
S. Henikoff
Shan Guan
U. Hobohm
W. Kabsch
Wei-Mou Zheng
Xin Liu
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2003
Field of study

Window profiles of amino acids in protein sequences are taken as a description of the amino acid environment. The relative entropy or Kullback-Leibler distance derived from profiles is used as a measure of dissimilarity for comparison of amino acids and secondary structure conformations. Distance matrices of amino acid pairs at different conformations are obtained, which display a non-negligible dependence of amino acid similarity on conformations. Based on the conformation specific distances clustering analysis for amino acids is conducted.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Convolutional LSTM Networks for Subcellular Localization of Proteins

Author: A Graves
A Höglund
A Prlić
C Magnan
G Dahl
HY Xiong
LJP Maaten Van Der
M Schuster
MCF Thomsen
O Emanuelsson
P Baldi
P Lena Di
S Briesemeister
S Henikoff
S Hochreiter
SF Altschul
T Blum
T Goldberg
T Petersen
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Online Research Database In Technology

Simplified amino acid alphabets based on deviation of conditional probability from random background

Author: A. Godzik
A.G. Murzin
C.E. Schafmeister
D.S. Riddle
Di Liu
H.S. Chan
J. Wang
Ji Qi
K.W. Plaxco
L.R. Murphy
M. Munson
S. Henikoff
S. Miyazawa
S.E. Brenner
S.F. Altschul
S.F. Altschul
Wei-Mou Zheng
Xin Liu
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2002
Field of study

The primitive data for deducing the Miyazawa-Jernigan contact energy or BLOSUM score matrix consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such conditional probability from random background, a scheme for reduction of amino acid alphabet is proposed. It is observed that evident discrepancy exists between reduced alphabets obtained from raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.Comment: 9 pages,3figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Multiple sequence alignment based on set covers

Author: A. Bahr
B. Manthey
B. Morgenstern
B. Morgenstern
C. Notredame
D. Gusfield
G. Vogt
J.D. Thompson
K. Katoh
O. Gotoh
P. Zhao
R.E. Green
R.F. Smith
S. Henikoff
T. Müller
T.P. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

arXiv.org e-Print Archive

CiteSeerX

Crossref

A methodology for determining amino-acid substitution matrices from set covers

Author: A. Bahr
A.D. McLachlan
D.F. Feng
G. Vogt
G.H. Gonnet
J. Setubal
J.D. Blake
J.K.M. Rao
M. Gribskov
M.F. Sagot
R.B. Russell
R.E. Green
R.F. Smith
S. Henikoff
S.A. Benner
T. Müller
T.P. Li
W.S.J. Valdar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2005
Field of study

We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

arXiv.org e-Print Archive

Crossref

Candida albicans repetitive elements display epigenetic diversity and plasticity

Author: A Ellahi
A Pidoux
A Selmecki
aF Straight
BD Strahl
C Ketel
C Li
C Trapnell
Ca Froyd
CJ Merrick
D Kadosh
DE Gottschling
GD Shankaranarayana
H Chibana
H Chibana
J Haran
J Huang
J Huang
J Nakayama
J Pérez-Martín
J Wendland
JC Tanny
JS Smith
Ka Morano
KR Hansen
L Vasiljeva
LH Freitas-Junior
LN Rusche
M Bryk
M Bühler
M Dubarry
M Paschini
M Van het Hoog
MA Pfaller
MD De Backer
MJ McEachern
N Saksouk
PA Dumesic
PR Lephart
PR Lephart
R Kaur
RB Wilson
RJ Bennett
S Greiss
S Henikoff
S Imai
S Kueng
S Rea
SI Iwaguchi
T Jones
T Kobayashi
T Kobayashi
T Kouzarides
VM Bruno
W Shou
WS Chu
X Bi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/03/2016
Field of study

Transcriptionally silent heterochromatin is associated with repetitive DNA. It is poorly understood whether and how heterochromatin differs between different organisms and whether its structure can be remodelled in response to environmental signals. Here, we address this question by analysing the chromatin state associated with DNA repeats in the human fungal pathogen Candida albicans. Our analyses indicate that, contrary to model systems, each type of repetitive element is assembled into a distinct chromatin state. Classical Sir2-dependent hypoacetylated and hypomethylated chromatin is associated with the rDNA locus while telomeric regions are assembled into a weak heterochromatin that is only mildly hypoacetylated and hypomethylated. Major Repeat Sequences, a class of tandem repeats, are assembled into an intermediate chromatin state bearing features of both euchromatin and heterochromatin. Marker gene silencing assays and genome-wide RNA sequencing reveals that C. albicans heterochromatin represses expression of repeat-associated coding and non-coding RNAs. We find that telomeric heterochromatin is dynamic and remodelled upon an environmental change. Weak heterochromatin is associated with telomeres at 30?°C, while robust heterochromatin is assembled over these regions at 39?°C, a temperature mimicking moderate fever in the host. Thus in C. albicans, differential chromatin states controls gene expression and epigenetic plasticity is linked to adaptation

Crossref

PubMed Central

Kent Academic Repository

Towards Reliable Automatic Protein Structure Alignment

Author: A. Caprara
A. Zemla
A.G. Murzin
A.S. Konagurthu
C.A. Rohl
C.B. Do
G. Lancia
H.M. Berman
I.N. Shindyalov
J. Shi
J. Xu
J.F. Gibrat
K. Mizuguchi
L. Kinch
L. Xie
M. Comin
M. Levitt
M. Moakher
M. Sadowski
N.M. Daniels
N.N. Alexandrov
S. Henikoff
S. Subbiah
S.B. Needleman
S.B. Pandit
S.R. Eddy
W. Pirovano
Y. Yang
Y. Ye
Y. Zhang
Y. Zhang
Y. Zhang
Publication venue
Publication date: 01/01/2013
Field of study

A variety of methods have been proposed for structure similarity calculation, which are called structure alignment or superposition. One major shortcoming in current structure alignment algorithms is in their inherent design, which is based on local structure similarity. In this work, we propose a method to incorporate global information in obtaining optimal alignments and superpositions. Our method, when applied to optimizing the TM-score and the GDT score, produces significantly better results than current state-of-the-art protein structure alignment tools. Specifically, if the highest TM-score found by TMalign is lower than (0.6) and the highest TM-score found by one of the tested methods is higher than (0.5), there is a probability of (42%) that TMalign failed to find TM-scores higher than (0.5), while the same probability is reduced to (2%) if our method is used. This could significantly improve the accuracy of fold detection if the cutoff TM-score of (0.5) is used. In addition, existing structure alignment algorithms focus on structure similarity alone and simply ignore other important similarities, such as sequence similarity. Our approach has the capacity to incorporate multiple similarities into the scoring function. Results show that sequence similarity aids in finding high quality protein structure alignments that are more consistent with eye-examined alignments in HOMSTRAD. Even when structure similarity itself fails to find alignments with any consistency with eye-examined alignments, our method remains capable of finding alignments highly similar to, or even identical to, eye-examined alignments.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Crossref

Legislative Development, The Attorney Accountability Act: A Case Study of the Complexities of Incentive-Based Legal Reform

Author: Henikoff Jamie S.
Peppet Scott R.
Publication venue: Colorado Law Scholarly Commons
Publication date: 01/01/1996
Field of study

Colorado Law