42 research outputs found
Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements
Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)
The evolution of the huntingtin-associated protein 40 (HAP40) in conjunction with huntingtin
Background The huntingtin-associated protein 40 (HAP40) abundantly interacts with huntingtin (HTT), the protein that is altered in Huntington's disease (HD). Therefore, we analysed the evolution of HAP40 and its interaction with HTT. Results We found that in amniotes HAP40 is encoded by a single-exon gene, whereas in all other organisms it is expressed from multi-exon genes. HAP40 co-occurs with HTT in unikonts, including filastereans such as Capsaspora owczarzaki and the amoebozoan Dictyostelium discoideum, but both proteins are absent from fungi. Outside unikonts, a few species, such as the free-living amoeboflagellate Naegleria gruberi, contain putative HTT and HAP40 orthologs. Biochemically we show that the interaction between HTT and HAP40 extends to fish, and bioinformatic analyses provide evidence for evolutionary conservation of this interaction. The closest homologue of HAP40 in current protein databases is the family of soluble N-ethylmaleimide-sensitive factor attachment proteins (SNAPs). Conclusion Our results indicate that the transition from a multi-exon to a single-exon gene appears to have taken place by retroposition during the divergence of amphibians and amniotes, followed by the loss of the parental multi-exon gene. Furthermore, it appears that the two proteins probably originated at the root of eukaryotes. Conservation of the interaction between HAP40 and HTT and their likely coevolution strongly indicate functional importance of this interaction
Recommended from our members
Clades of huge phages from across Earth's ecosystems.
Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems
Iterative SE(3)-Transformers
When manipulating three-dimensional data, it is possible to ensure that
rotational and translational symmetries are respected by applying so-called
SE(3)-equivariant models. Protein structure prediction is a prominent example
of a task which displays these symmetries. Recent work in this area has
successfully made use of an SE(3)-equivariant model, applying an iterative
SE(3)-equivariant attention mechanism. Motivated by this application, we
implement an iterative version of the SE(3)-Transformer, an SE(3)-equivariant
attention-based model for graph data. We address the additional complications
which arise when applying the SE(3)-Transformer in an iterative fashion,
compare the iterative and single-pass versions on a toy problem, and consider
why an iterative model may be beneficial in some problem settings. We make the
code for our implementation available to the community
Large-scale invasion of unicellular eukaryotic genomes by integrating DNA viruses
Eukaryotic genomes contain a variety of endogenous viral elements (EVEs), which are mostly derived from RNA and ssDNA viruses that are no longer functional and are considered to be “genomic fossils.” Genomic surveys of EVEs, however, are strongly biased toward animals and plants, whereas protists, which represent the majority of eukaryotic diversity, remain poorly represented. Here, we show that protist genomes harbor tens to thousands of diverse, ~14 to 40 kbp long dsDNA viruses. These EVEs, composed of virophages, Polinton-like viruses, and related entities, have remained hitherto hidden owing to poor sequence conservation between virus groups and their repetitive nature that precluded accurate short-read assembly. We show that long-read sequencing technology is ideal for resolving virus insertions. Many protist EVEs appear intact, and most encode integrases, which suggests that they have actively colonized hosts across the tree of eukaryotes. We also found evidence for gene expression in host transcriptomes and that closely related virophage and Polinton-like virus genomes are abundant in viral metagenomes, indicating that many EVEs are probably functional viruses