14,987 research outputs found
Pairwise alignment incorporating dipeptide covariation
Motivation: Standard algorithms for pairwise protein sequence alignment make
the simplifying assumption that amino acid substitutions at neighboring sites
are uncorrelated. This assumption allows implementation of fast algorithms for
pairwise sequence alignment, but it ignores information that could conceivably
increase the power of remote homolog detection. We examine the validity of this
assumption by constructing extended substitution matrixes that encapsulate the
observed correlations between neighboring sites, by developing an efficient and
rigorous algorithm for pairwise protein sequence alignment that incorporates
these local substitution correlations, and by assessing the ability of this
algorithm to detect remote homologies. Results: Our analysis indicates that
local correlations between substitutions are not strong on the average.
Furthermore, incorporating local substitution correlations into pairwise
alignment did not lead to a statistically significant improvement in remote
homology detection. Therefore, the standard assumption that individual residues
within protein sequences evolve independently of neighboring positions appears
to be an efficient and appropriate approximation
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Novel phylogenetic algorithm to monitor human tropism in Egyptian H5N1-HPAIV reveals evolution toward efficient human-to-human transmission
Years of endemic infections with highly pathogenic avian influenza (HPAI) A subtype H5N1 virus in poultry and high numbers of infections in humans provide ample opportunity in Egypt for H5N1-HPAIV to develop pandemic potential. In an effort to better understand the viral determinants that facilitate human infections of the Egyptian H5N1-HPAIVvirus, we developed a new phylogenetic algorithm based on a new distance measure derived from the informational spectrum method (ISM). This new approach, which describes functional aspects of the evolution of the hemagglutinin subunit 1 (HA1), revealed a growing group G2 of H5N1-HPAIV in Egypt after 2009 that acquired new informational spectrum (IS) properties suggestive of an increased human tropism and pandemic potential. While in 2006 all viruses in Egypt belonged to the G1 group, by 2011 these viruses were virtually replaced by G2 viruses. All of the G2 viruses displayed four characteristic mutations (D43N, S120(D,N), (S,L)129Δ and I151T), three of which were previously reported to increase binding to the human receptor. Already in 2006–2008 G2 viruses were significantly (p<0.02) more often found in humans than expected from their overall prevalence and this further increased in 2009–2011 (p<0.007). Our approach also identified viruses that acquired additional mutations that we predict to further enhance their human tropism. The extensive evolution of Egyptian H5N1-HPAIV towards a preferential human tropism underlines an urgent need to closely monitor these viruses with respect to molecular determinants of virulence
Molecular phylogenetics: principles and practice
Phylogenies are important for addressing various biological questions such as relationships among species or genes, the origin and spread of viral infection and the demographic changes and migration patterns of species. The advancement of sequencing technologies has taken phylogenetic analysis to a new height. Phylogenies have permeated nearly every branch of biology, and the plethora of phylogenetic methods and software packages that are now available may seem daunting to an experimental biologist. Here, we review the major methods of phylogenetic analysis, including parsimony, distance, likelihood and Bayesian methods. We discuss their strengths and weaknesses and provide guidance for their use
Adaptive evolution is substantially impeded by Hill–Robertson interference in Drosophila
Hill–Robertson interference (HRi) is expected to reduce the efficiency of natural selection when two or more linked selected sites do not segregate freely, but no attempt has been done so far to quantify the overall impact of HRi on the rate of adaptive evolution for any given genome. In this work, we estimate how much HRi impedes the rate of adaptive evolution in the coding genome of Drosophila melanogaster. We compiled a data set of 6,141 autosomal protein-coding genes from Drosophila, from which polymorphism levels in D. melanogaster and divergence out to D. yakuba were estimated. The rate of adaptive evolution was calculated using a derivative of the McDonald–Kreitman test that controls for slightly deleterious mutations. We find that the rate of adaptive amino acid substitution at a given position of the genome is positively correlated to both the rate of recombination and the mutation rate, and negatively correlated to the gene density of the region. These correlations are robust to controlling for each other, for synonymous codon bias and for gene functions related to immune response and testes. We show that HRi diminishes the rate of adaptive evolution by approximately 27%. Interestingly, genes with low mutation rates embedded in gene poor regions lose approximately 17% of their adaptive substitutions whereas genes with high mutation rates embedded in gene rich regions lose approximately 60%. We conclude that HRi hampers the rate of adaptive evolution in Drosophila and that the variation in recombination, mutation, and gene density along the genome affects the HRi effect
Strong Purifying Selection at Synonymous Sites in D. melanogaster
Synonymous sites are generally assumed to be subject to weak selective
constraint. For this reason, they are often neglected as a possible source of
important functional variation. We use site frequency spectra from deep
population sequencing data to show that, contrary to this expectation, 22% of
four-fold synonymous (4D) sites in D. melanogaster evolve under very strong
selective constraint while few, if any, appear to be under weak constraint.
Linking polymorphism with divergence data, we further find that the fraction of
synonymous sites exposed to strong purifying selection is higher for those
positions that show slower evolution on the Drosophila phylogeny. The function
underlying the inferred strong constraint appears to be separate from splicing
enhancers, nucleosome positioning, and the translational optimization
generating canonical codon bias. The fraction of synonymous sites under strong
constraint within a gene correlates well with gene expression, particularly in
the mid-late embryo, pupae, and adult developmental stages. Genes enriched in
strongly constrained synonymous sites tend to be particularly functionally
important and are often involved in key developmental pathways. Given that the
observed widespread constraint acting on synonymous sites is likely not limited
to Drosophila, the role of synonymous sites in genetic disease and adaptation
should be reevaluated
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Quantifying evolutionary constraints on B cell affinity maturation
The antibody repertoire of each individual is continuously updated by the
evolutionary process of B cell receptor mutation and selection. It has recently
become possible to gain detailed information concerning this process through
high-throughput sequencing. Here, we develop modern statistical molecular
evolution methods for the analysis of B cell sequence data, and then apply them
to a very deep short-read data set of B cell receptors. We find that the
substitution process is conserved across individuals but varies significantly
across gene segments. We investigate selection on B cell receptors using a
novel method that side-steps the difficulties encountered by previous work in
differentiating between selection and motif-driven mutation; this is done
through stochastic mapping and empirical Bayes estimators that compare the
evolution of in-frame and out-of-frame rearrangements. We use this new method
to derive a per-residue map of selection, which provides a more nuanced view of
the constraints on framework and variable regions.Comment: Previously entitled "Substitution and site-specific selection driving
B cell affinity maturation is consistent across individuals
- …