20,550 research outputs found
Protein sectors: statistical coupling analysis versus conservation
Statistical coupling analysis (SCA) is a method for analyzing multiple
sequence alignments that was used to identify groups of coevolving residues
termed "sectors". The method applies spectral analysis to a matrix obtained by
combining correlation information with sequence conservation. It has been
asserted that the protein sectors identified by SCA are functionally
significant, with different sectors controlling different biochemical
properties of the protein. Here we reconsider the available experimental data
and note that it involves almost exclusively proteins with a single sector. We
show that in this case sequence conservation is the dominating factor in SCA,
and can alone be used to make statistically equivalent functional predictions.
Therefore, we suggest shifting the experimental focus to proteins for which SCA
identifies several sectors. Correlations in protein alignments, which have been
shown to be informative in a number of independent studies, would then be less
dominated by sequence conservation.Comment: 36 pages, 17 figure
Graph-to-Sequence Learning using Gated Graph Neural Networks
Many NLP applications can be framed as a graph-to-sequence learning problem.
Previous work proposing neural architectures on this setting obtained promising
results compared to grammar-based approaches but still rely on linearisation
heuristics and/or standard recurrent networks to achieve the best performance.
In this work, we propose a new model that encodes the full structural
information contained in the graph. Our architecture couples the recently
proposed Gated Graph Neural Networks with an input transformation that allows
nodes and edges to have their own hidden representations, while tackling the
parameter explosion problem present in previous work. Experimental results show
that our model outperforms strong baselines in generation from AMR graphs and
syntax-based neural machine translation.Comment: ACL 201
Quantitative test of the barrier nucleosome model for statistical positioning of nucleosomes up- and downstream of transcription start sites
The positions of nucleosomes in eukaryotic genomes determine which parts of
the DNA sequence are readily accessible for regulatory proteins and which are
not. Genome-wide maps of nucleosome positions have revealed a salient pattern
around transcription start sites, involving a nucleosome-free region (NFR)
flanked by a pronounced periodic pattern in the average nucleosome density.
While the periodic pattern clearly reflects well-positioned nucleosomes, the
positioning mechanism is less clear. A recent experimental study by Mavrich et
al. argued that the pattern observed in S. cerevisiae is qualitatively
consistent with a `barrier nucleosome model', in which the oscillatory pattern
is created by the statistical positioning mechanism of Kornberg and Stryer. On
the other hand, there is clear evidence for intrinsic sequence preferences of
nucleosomes, and it is unclear to what extent these sequence preferences affect
the observed pattern. To test the barrier nucleosome model, we quantitatively
analyze yeast nucleosome positioning data both up- and downstream from NFRs.
Our analysis is based on the Tonks model of statistical physics which
quantifies the interplay between the excluded-volume interaction of nucleosomes
and their positional entropy. We find that although the typical patterns on the
two sides of the NFR are different, they are both quantitatively described by
the same physical model, with the same parameters, but different boundary
conditions. The inferred boundary conditions suggest that the first nucleosome
downstream from the NFR (the +1 nucleosome) is typically directly positioned
while the first nucleosome upstream is statistically positioned via a
nucleosome-repelling DNA region. These boundary conditions, which can be
locally encoded into the genome sequence, significantly shape the statistical
distribution of nucleosomes over a range of up to ~1000 bp to each side.Comment: includes supporting materia
Properties of the Remnant Clockwise Disk of Young Stars in the Galactic Center
We present new kinematic measurements and modeling of a sample of 116 young
stars in the central parsec of the Galaxy in order to investigate the
properties of the young stellar disk. The measurements were derived from a
combination of speckle and laser guide star adaptive optics imaging and
integral field spectroscopy from the Keck telescopes. Compared to earlier disk
studies, the most important kinematic measurement improvement is in the
precision of the accelerations in the plane of the sky, which have a factor of
six smaller uncertainties (~10 uas/yr/yr). We have also added the first radial
velocity measurements for 8 young stars, increasing the sample at the largest
radii (6"-12") by 25%. We derive the ensemble properties of the observed stars
using Monte-Carlo simulations of mock data. There is one highly significant
kinematic feature (~20 sigma), corresponding to the well-known clockwise disk,
and no significant feature is detected at the location of the previously
claimed counterclockwise disk. The true disk fraction is estimated to be ~20%,
a factor of ~2.5 lower than previous claims, suggesting that we may be
observing the remnant of what used to be a more densely populated stellar disk.
The similarity in the kinematic properties of the B stars and the O/WR stars
suggests a common star formation event. The intrinsic eccentricity distribution
of the disk stars is unimodal, with an average value of = 0.27 +/- 0.07,
which we show can be achieved through dynamical relaxation in an initially
circular disk with a moderately top-heavy mass function.Comment: 65 pages, 22 figures, 8 tables, submitted to Ap
Utilizing Protein Structure to Identify Non-Random Somatic Mutations
Motivation: Human cancer is caused by the accumulation of somatic mutations
in tumor suppressors and oncogenes within the genome. In the case of oncogenes,
recent theory suggests that there are only a few key "driver" mutations
responsible for tumorigenesis. As there have been significant pharmacological
successes in developing drugs that treat cancers that carry these driver
mutations, several methods that rely on mutational clustering have been
developed to identify them. However, these methods consider proteins as a
single strand without taking their spatial structures into account. We propose
a new methodology that incorporates protein tertiary structure in order to
increase our power when identifying mutation clustering.
Results: We have developed a novel algorithm, iPAC: identification of Protein
Amino acid Clustering, for the identification of non-random somatic mutations
in proteins that takes into account the three dimensional protein structure. By
using the tertiary information, we are able to detect both novel clusters in
proteins that are known to exhibit mutation clustering as well as identify
clusters in proteins without evidence of clustering based on existing methods.
For example, by combining the data in the Protein Data Bank (PDB) and the
Catalogue of Somatic Mutations in Cancer, our algorithm identifies new
mutational clusters in well known cancer proteins such as KRAS and PI3KCa.
Further, by utilizing the tertiary structure, our algorithm also identifies
clusters in EGFR, EIF2AK2, and other proteins that are not identified by
current methodology
Two polymorphisms facilitate differences in plasticity between two chicken major histocompatibility complex class I proteins
Major histocompatibility complex class I molecules (MHC I) present peptides to cytotoxic T-cells at the surface of almost all nucleated cells. The function of MHC I molecules is to select high affinity peptides from a large intracellular pool and they are assisted in this process by co-factor molecules, notably tapasin. In contrast to mammals, MHC homozygous chickens express a single MHC I gene locus, termed BF2, which is hypothesised to have co-evolved with the highly polymorphic tapasin within stable haplotypes. The BF2 molecules of the B15 and B19 haplotypes have recently been shown to differ in their interactions with tapasin and in their peptide selection properties. This study investigated whether these observations might be explained by differences in the protein plasticity that is encoded into the MHC I structure by primary sequence polymorphisms. Furthermore, we aimed to demonstrate the utility of a complimentary modelling approach to the understanding of complex experimental data. Combining mechanistic molecular dynamics simulations and the primary sequence based technique of statistical coupling analysis, we show how two of the eight polymorphisms between BF2*15:01 and BF2*19:01 facilitate differences in plasticity. We show that BF2*15:01 is intrinsically more plastic than BF2*19:01, exploring more conformations in the absence of peptide. We identify a protein sector of contiguous residues connecting the membrane bound ?3 domain and the heavy chain peptide binding site. This sector contains two of the eight polymorphic residues. One is residue 22 in the peptide binding domain and the other 220 is in the ?3 domain, a putative tapasin binding site. These observations are in correspondence with the experimentally observed functional differences of these molecules and suggest a mechanism for how modulation of MHC I plasticity by tapasin catalyses peptide selection allosterically
- …