5,703 research outputs found

    An Alternative Model of Amino Acid Replacement

    Full text link
    The observed correlations between pairs of homologous protein sequences are typically explained in terms of a Markovian dynamic of amino acid substitution. This model assumes that every location on the protein sequence has the same background distribution of amino acids, an assumption that is incompatible with the observed heterogeneity of protein amino acid profiles and with the success of profile multiple sequence alignment. We propose an alternative model of amino acid replacement during protein evolution based upon the assumption that the variation of the amino acid background distribution from one residue to the next is sufficient to explain the observed sequence correlations of homologs. The resulting dynamical model of independent replacements drawn from heterogeneous backgrounds is simple and consistent, and provides a unified homology match score for sequence-sequence, sequence-profile and profile-profile alignment.Comment: Minor improvements. Added figure and reference

    Epigenetics & chromatin: Interactions and processes

    Get PDF
    On 11 to 13 March 2013, BioMed Central will be hosting its inaugural conference, Epigenetics & Chromatin: Interactions and Processes, at Harvard Medical School, Cambridge, MA, USA. Epigenetics & Chromatin has now launched a special article series based on the general themes of the conference

    Pairwise alignment incorporating dipeptide covariation

    Full text link
    Motivation: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrixes that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations, and by assessing the ability of this algorithm to detect remote homologies. Results: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation

    Methylation-Sensitive Expression of a DNA Demethylase Gene Serves As an Epigenetic Rheostat

    Get PDF
    Genomes must balance active suppression of transposable elements (TEs) with the need to maintain gene expression. In Arabidopsis, euchromatic TEs are targeted by RNA-directed DNA methylation (RdDM). Conversely, active DNA demethylation prevents accumulation of methylation at genes proximal to these TEs. It is unknown how a cellular balance between methylation and demethylation activities is achieved. Here we show that both RdDM and DNA demethylation are highly active at a TE proximal to the major DNA demethylase gene ROS1. Unexpectedly, and in contrast to most other genomic targets, expression of ROS1 is promoted by DNA methylation and antagonized by DNA demethylation. We demonstrate that inducing methylation in the ROS1 proximal region is sufficient to restore ROS1 expression in an RdDM mutant. Additionally, methylation-sensitive expression of ROS1 is conserved in other species, suggesting it is adaptive. We propose that the ROS1 locus functions as an epigenetic rheostat, tuning the level of demethylase activity in response to methylation alterations, thus ensuring epigenomic stability.Pew Charitable Trusts (Biomedical Scholars Award)Alexander and Margaret Stewart Trust (Scholars Award

    Distances and classification of amino acids for different protein secondary structures

    Full text link
    Window profiles of amino acids in protein sequences are taken as a description of the amino acid environment. The relative entropy or Kullback-Leibler distance derived from profiles is used as a measure of dissimilarity for comparison of amino acids and secondary structure conformations. Distance matrices of amino acid pairs at different conformations are obtained, which display a non-negligible dependence of amino acid similarity on conformations. Based on the conformation specific distances clustering analysis for amino acids is conducted.Comment: 15 pages, 8 figure

    How should novelty be valued in science?

    Get PDF
    Scientists are under increasing pressure to do "novel" research. Here I explore whether there are risks to overemphasizing novelty when deciding what constitutes good science. I review studies from the philosophy of science to help understand how important an explicit emphasis on novelty might be for scientific progress. I also review studies from the sociology of science to anticipate how emphasizing novelty might impact the structure and function of the scientific community. I conclude that placing too much value on novelty could have counterproductive effects on both the rate of progress in science and the organization of the scientific community. I finish by recommending that our current emphasis on novelty be replaced by a renewed emphasis on predictive power as a characteristic of good science.</jats:p

    Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    Get PDF
    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed)

    Convolutional LSTM Networks for Subcellular Localization of Proteins

    Get PDF
    Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

    Multiple sequence alignment based on set covers

    Full text link
    We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches
    corecore