20,550 research outputs found

    Protein sectors: statistical coupling analysis versus conservation

    Full text link
    Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.Comment: 36 pages, 17 figure

    Graph-to-Sequence Learning using Gated Graph Neural Networks

    Full text link
    Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work, we propose a new model that encodes the full structural information contained in the graph. Our architecture couples the recently proposed Gated Graph Neural Networks with an input transformation that allows nodes and edges to have their own hidden representations, while tackling the parameter explosion problem present in previous work. Experimental results show that our model outperforms strong baselines in generation from AMR graphs and syntax-based neural machine translation.Comment: ACL 201

    Quantitative test of the barrier nucleosome model for statistical positioning of nucleosomes up- and downstream of transcription start sites

    Get PDF
    The positions of nucleosomes in eukaryotic genomes determine which parts of the DNA sequence are readily accessible for regulatory proteins and which are not. Genome-wide maps of nucleosome positions have revealed a salient pattern around transcription start sites, involving a nucleosome-free region (NFR) flanked by a pronounced periodic pattern in the average nucleosome density. While the periodic pattern clearly reflects well-positioned nucleosomes, the positioning mechanism is less clear. A recent experimental study by Mavrich et al. argued that the pattern observed in S. cerevisiae is qualitatively consistent with a `barrier nucleosome model', in which the oscillatory pattern is created by the statistical positioning mechanism of Kornberg and Stryer. On the other hand, there is clear evidence for intrinsic sequence preferences of nucleosomes, and it is unclear to what extent these sequence preferences affect the observed pattern. To test the barrier nucleosome model, we quantitatively analyze yeast nucleosome positioning data both up- and downstream from NFRs. Our analysis is based on the Tonks model of statistical physics which quantifies the interplay between the excluded-volume interaction of nucleosomes and their positional entropy. We find that although the typical patterns on the two sides of the NFR are different, they are both quantitatively described by the same physical model, with the same parameters, but different boundary conditions. The inferred boundary conditions suggest that the first nucleosome downstream from the NFR (the +1 nucleosome) is typically directly positioned while the first nucleosome upstream is statistically positioned via a nucleosome-repelling DNA region. These boundary conditions, which can be locally encoded into the genome sequence, significantly shape the statistical distribution of nucleosomes over a range of up to ~1000 bp to each side.Comment: includes supporting materia

    Properties of the Remnant Clockwise Disk of Young Stars in the Galactic Center

    Get PDF
    We present new kinematic measurements and modeling of a sample of 116 young stars in the central parsec of the Galaxy in order to investigate the properties of the young stellar disk. The measurements were derived from a combination of speckle and laser guide star adaptive optics imaging and integral field spectroscopy from the Keck telescopes. Compared to earlier disk studies, the most important kinematic measurement improvement is in the precision of the accelerations in the plane of the sky, which have a factor of six smaller uncertainties (~10 uas/yr/yr). We have also added the first radial velocity measurements for 8 young stars, increasing the sample at the largest radii (6"-12") by 25%. We derive the ensemble properties of the observed stars using Monte-Carlo simulations of mock data. There is one highly significant kinematic feature (~20 sigma), corresponding to the well-known clockwise disk, and no significant feature is detected at the location of the previously claimed counterclockwise disk. The true disk fraction is estimated to be ~20%, a factor of ~2.5 lower than previous claims, suggesting that we may be observing the remnant of what used to be a more densely populated stellar disk. The similarity in the kinematic properties of the B stars and the O/WR stars suggests a common star formation event. The intrinsic eccentricity distribution of the disk stars is unimodal, with an average value of = 0.27 +/- 0.07, which we show can be achieved through dynamical relaxation in an initially circular disk with a moderately top-heavy mass function.Comment: 65 pages, 22 figures, 8 tables, submitted to Ap

    Utilizing Protein Structure to Identify Non-Random Somatic Mutations

    Get PDF
    Motivation: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key "driver" mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose a new methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering. Results: We have developed a novel algorithm, iPAC: identification of Protein Amino acid Clustering, for the identification of non-random somatic mutations in proteins that takes into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KCa. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology

    Two polymorphisms facilitate differences in plasticity between two chicken major histocompatibility complex class I proteins

    Get PDF
    Major histocompatibility complex class I molecules (MHC I) present peptides to cytotoxic T-cells at the surface of almost all nucleated cells. The function of MHC I molecules is to select high affinity peptides from a large intracellular pool and they are assisted in this process by co-factor molecules, notably tapasin. In contrast to mammals, MHC homozygous chickens express a single MHC I gene locus, termed BF2, which is hypothesised to have co-evolved with the highly polymorphic tapasin within stable haplotypes. The BF2 molecules of the B15 and B19 haplotypes have recently been shown to differ in their interactions with tapasin and in their peptide selection properties. This study investigated whether these observations might be explained by differences in the protein plasticity that is encoded into the MHC I structure by primary sequence polymorphisms. Furthermore, we aimed to demonstrate the utility of a complimentary modelling approach to the understanding of complex experimental data. Combining mechanistic molecular dynamics simulations and the primary sequence based technique of statistical coupling analysis, we show how two of the eight polymorphisms between BF2*15:01 and BF2*19:01 facilitate differences in plasticity. We show that BF2*15:01 is intrinsically more plastic than BF2*19:01, exploring more conformations in the absence of peptide. We identify a protein sector of contiguous residues connecting the membrane bound ?3 domain and the heavy chain peptide binding site. This sector contains two of the eight polymorphic residues. One is residue 22 in the peptide binding domain and the other 220 is in the ?3 domain, a putative tapasin binding site. These observations are in correspondence with the experimentally observed functional differences of these molecules and suggest a mechanism for how modulation of MHC I plasticity by tapasin catalyses peptide selection allosterically
    • …
    corecore