5,703 research outputs found
An Alternative Model of Amino Acid Replacement
The observed correlations between pairs of homologous protein sequences are
typically explained in terms of a Markovian dynamic of amino acid substitution.
This model assumes that every location on the protein sequence has the same
background distribution of amino acids, an assumption that is incompatible with
the observed heterogeneity of protein amino acid profiles and with the success
of profile multiple sequence alignment. We propose an alternative model of
amino acid replacement during protein evolution based upon the assumption that
the variation of the amino acid background distribution from one residue to the
next is sufficient to explain the observed sequence correlations of homologs.
The resulting dynamical model of independent replacements drawn from
heterogeneous backgrounds is simple and consistent, and provides a unified
homology match score for sequence-sequence, sequence-profile and
profile-profile alignment.Comment: Minor improvements. Added figure and reference
Epigenetics & chromatin: Interactions and processes
On 11 to 13 March 2013, BioMed Central will be hosting its inaugural conference, Epigenetics & Chromatin: Interactions and Processes, at Harvard Medical School, Cambridge, MA, USA. Epigenetics & Chromatin has now launched a special article series based on the general themes of the conference
Pairwise alignment incorporating dipeptide covariation
Motivation: Standard algorithms for pairwise protein sequence alignment make
the simplifying assumption that amino acid substitutions at neighboring sites
are uncorrelated. This assumption allows implementation of fast algorithms for
pairwise sequence alignment, but it ignores information that could conceivably
increase the power of remote homolog detection. We examine the validity of this
assumption by constructing extended substitution matrixes that encapsulate the
observed correlations between neighboring sites, by developing an efficient and
rigorous algorithm for pairwise protein sequence alignment that incorporates
these local substitution correlations, and by assessing the ability of this
algorithm to detect remote homologies. Results: Our analysis indicates that
local correlations between substitutions are not strong on the average.
Furthermore, incorporating local substitution correlations into pairwise
alignment did not lead to a statistically significant improvement in remote
homology detection. Therefore, the standard assumption that individual residues
within protein sequences evolve independently of neighboring positions appears
to be an efficient and appropriate approximation
Methylation-Sensitive Expression of a DNA Demethylase Gene Serves As an Epigenetic Rheostat
Genomes must balance active suppression of transposable elements (TEs) with the need to maintain gene expression. In Arabidopsis, euchromatic TEs are targeted by RNA-directed DNA methylation (RdDM). Conversely, active DNA demethylation prevents accumulation of methylation at genes proximal to these TEs. It is unknown how a cellular balance between methylation and demethylation activities is achieved. Here we show that both RdDM and DNA demethylation are highly active at a TE proximal to the major DNA demethylase gene ROS1. Unexpectedly, and in contrast to most other genomic targets, expression of ROS1 is promoted by DNA methylation and antagonized by DNA demethylation. We demonstrate that inducing methylation in the ROS1 proximal region is sufficient to restore ROS1 expression in an RdDM mutant. Additionally, methylation-sensitive expression of ROS1 is conserved in other species, suggesting it is adaptive. We propose that the ROS1 locus functions as an epigenetic rheostat, tuning the level of demethylase activity in response to methylation alterations, thus ensuring epigenomic stability.Pew Charitable Trusts (Biomedical Scholars Award)Alexander and Margaret Stewart Trust (Scholars Award
Distances and classification of amino acids for different protein secondary structures
Window profiles of amino acids in protein sequences are taken as a
description of the amino acid environment. The relative entropy or
Kullback-Leibler distance derived from profiles is used as a measure of
dissimilarity for comparison of amino acids and secondary structure
conformations. Distance matrices of amino acid pairs at different conformations
are obtained, which display a non-negligible dependence of amino acid
similarity on conformations. Based on the conformation specific distances
clustering analysis for amino acids is conducted.Comment: 15 pages, 8 figure
How should novelty be valued in science?
Scientists are under increasing pressure to do "novel" research. Here I explore whether there are risks to overemphasizing novelty when deciding what constitutes good science. I review studies from the philosophy of science to help understand how important an explicit emphasis on novelty might be for scientific progress. I also review studies from the sociology of science to anticipate how emphasizing novelty might impact the structure and function of the scientific community. I conclude that placing too much value on novelty could have counterproductive effects on both the rate of progress in science and the organization of the scientific community. I finish by recommending that our current emphasis on novelty be replaced by a renewed emphasis on predictive power as a characteristic of good science.</jats:p
Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion
Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed)
Convolutional LSTM Networks for Subcellular Localization of Proteins
Machine learning is widely used to analyze biological sequence data.
Non-sequential models such as SVMs or feed-forward neural networks are often
used although they have no natural way of handling sequences of varying length.
Recurrent neural networks such as the long short term memory (LSTM) model on
the other hand are designed to handle sequences. In this study we demonstrate
that LSTM networks predict the subcellular location of proteins given only the
protein sequence with high accuracy (0.902) outperforming current state of the
art algorithms. We further improve the performance by introducing convolutional
filters and experiment with an attention mechanism which lets the LSTM focus on
specific parts of the protein. Lastly we introduce new visualizations of both
the convolutional filters and the attention mechanisms and show how they can be
used to extract biological relevant knowledge from the LSTM networks
Recommended from our members
High-resolution mapping of transcription factor binding sites on native chromatin
Sequence-specific DNA-binding proteins including transcription factors (TFs) are key determinants of gene regulation and chromatin architecture. Formaldehyde cross-linking and sonication followed by Chromatin ImmunoPrecipitation (X-ChIP) is widely used for profiling of TF binding, but is limited by low resolution and poor specificity and sensitivity. We present a simple protocol that starts with micrococcal nuclease-digested uncross-linked chromatin and is followed by affinity purification of TFs and paired-end sequencing. The resulting ORGANIC (Occupied Regions of Genomes from Affinity-purified Naturally Isolated Chromatin) profiles of Saccharomyces cerevisiae Abf1 and Reb1 provide highly accurate base-pair resolution maps that are not biased toward accessible chromatin, and do not require input normalization. We also demonstrate the high specificity of our method when applied to larger genomes by profiling Drosophila melanogaster GAGA Factor and Pipsqueak. Our results suggest that ORGANIC profiling is a widely applicable high-resolution method for sensitive and specific profiling of direct protein-DNA interactions
Multiple sequence alignment based on set covers
We introduce a new heuristic for the multiple alignment of a set of
sequences. The heuristic is based on a set cover of the residue alphabet of the
sequences, and also on the determination of a significant set of blocks
comprising subsequences of the sequences to be aligned. These blocks are
obtained with the aid of a new data structure, called a suffix-set tree, which
is constructed from the input sequences with the guidance of the
residue-alphabet set cover and generalizes the well-known suffix tree of the
sequence set. We provide performance results on selected BAliBASE amino-acid
sequences and compare them with those yielded by some prominent approaches
- …
