83 research outputs found
Universality and predictability in molecular quantitative genetics
Molecular traits, such as gene expression levels or protein binding
affinities, are increasingly accessible to quantitative measurement by modern
high-throughput techniques. Such traits measure molecular functions and, from
an evolutionary point of view, are important as targets of natural selection.
We review recent developments in evolutionary theory and experiments that are
expected to become building blocks of a quantitative genetics of molecular
traits. We focus on universal evolutionary characteristics: these are largely
independent of a trait's genetic basis, which is often at least partially
unknown. We show that universal measurements can be used to infer selection on
a quantitative trait, which determines its evolutionary mode of conservation or
adaptation. Furthermore, universality is closely linked to predictability of
trait evolution across lineages. We argue that universal trait statistics
extends over a range of cellular scales and opens new avenues of quantitative
evolutionary systems biology
Adaptive evolution of molecular phenotypes
Molecular phenotypes link genomic information with organismic functions,
fitness, and evolution. Quantitative traits are complex phenotypes that depend
on multiple genomic loci. In this paper, we study the adaptive evolution of a
quantitative trait under time-dependent selection, which arises from
environmental changes or through fitness interactions with other co-evolving
phenotypes. We analyze a model of trait evolution under mutations and genetic
drift in a single-peak fitness seascape. The fitness peak performs a
constrained random walk in the trait amplitude, which determines the
time-dependent trait optimum in a given population. We derive analytical
expressions for the distribution of the time-dependent trait divergence between
populations and of the trait diversity within populations. Based on this
solution, we develop a method to infer adaptive evolution of quantitative
traits. Specifically, we show that the ratio of the average trait divergence
and the diversity is a universal function of evolutionary time, which predicts
the stabilizing strength and the driving rate of the fitness seascape. From an
information-theoretic point of view, this function measures the
macro-evolutionary entropy in a population ensemble, which determines the
predictability of the evolutionary process. Our solution also quantifies two
key characteristics of adapting populations: the cumulative fitness flux, which
measures the total amount of adaptation, and the adaptive load, which is the
fitness cost due to a population's lag behind the fitness peak.Comment: Figures are not optimally displayed in Firefo
The size of the immune repertoire of bacteria
Some bacteria and archaea possess an immune system, based on the CRISPR-Cas
mechanism, that confers adaptive immunity against phage. In such species,
individual bacteria maintain a "cassette" of viral DNA elements called spacers
as a memory of past infections. The typical cassette contains a few dozen
spacers. Given that bacteria can have very large genomes, and since having more
spacers should confer a better memory, it is puzzling that so little genetic
space would be devoted by bacteria to their adaptive immune system. Here, we
identify a fundamental trade-off between the size of the bacterial immune
repertoire and effectiveness of response to a given threat, and show how this
tradeoff imposes a limit on the optimal size of the CRISPR cassette.Comment: 9 pages, 5 figure
Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space
Group-equivariant neural networks have emerged as a data-efficient approach
to solve classification and regression tasks, while respecting the relevant
symmetries of the data. However, little work has been done to extend this
paradigm to the unsupervised and generative domains. Here, we present
Holographic-(V)AE (H-(V)AE), a fully end-to-end SO(3)-equivariant (variational)
autoencoder in Fourier space, suitable for unsupervised learning and generation
of data distributed around a specified origin. H-(V)AE is trained to
reconstruct the spherical Fourier encoding of data, learning in the process a
latent space with a maximally informative invariant embedding alongside an
equivariant frame describing the orientation of the data. We extensively test
the performance of H-(V)AE on diverse datasets and show that its latent space
efficiently encodes the categorical features of spherical images and structural
features of protein atomic environments. Our work can further be seen as a case
study for equivariant modeling of a data distribution by reconstructing its
Fourier encoding
H-Packer: Holographic Rotationally Equivariant Convolutional Neural Network for Protein Side-Chain Packing
Accurately modeling protein 3D structure is essential for the design of
functional proteins. An important sub-task of structure modeling is protein
side-chain packing: predicting the conformation of side-chains (rotamers) given
the protein's backbone structure and amino-acid sequence. Conventional
approaches for this task rely on expensive sampling procedures over
hand-crafted energy functions and rotamer libraries. Recently, several deep
learning methods have been developed to tackle the problem in a data-driven
way, albeit with vastly different formulations (from image-to-image translation
to directly predicting atomic coordinates). Here, we frame the problem as a
joint regression over the side-chains' true degrees of freedom: the dihedral
angles. We carefully study possible objective functions for this task,
while accounting for the underlying symmetries of the task. We propose
Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain
packing built on top of two light-weight rotationally equivariant neural
networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is
computationally efficient and shows favorable performance against conventional
physics-based algorithms and is competitive against alternative deep learning
solutions.Comment: Accepted as a conference paper at MLCB 2023. 8 pages main body, 20
pages with appendix. 10 figure
Deep generative selection models of T and B cell receptor repertoires with soNNia
Subclasses of lymphocytes carry different functional roles to work together
to produce an immune response and lasting immunity. Additionally to these
functional roles, T and B-cell lymphocytes rely on the diversity of their
receptor chains to recognize different pathogens. The lymphocyte subclasses
emerge from common ancestors generated with the same diversity of receptors
during selection processes. Here we leverage biophysical models of receptor
generation with machine learning models of selection to identify specific
sequence features characteristic of functional lymphocyte repertoires and
subrepertoires. Specifically using only repertoire level sequence information,
we classify CD4 and CD8 T-cells, find correlations between receptor
chains arising during selection and identify T-cells subsets that are targets
of pathogenic epitopes. We also show examples of when simple linear classifiers
do as well as more complex machine learning methods
MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories
Simulation-based inference enables learning the parameters of a model even
when its likelihood cannot be computed in practice. One class of methods uses
data simulated with different parameters to infer an amortized estimator for
the likelihood-to-evidence ratio, or equivalently the posterior function. We
show that this approach can be formulated in terms of mutual information
maximization between model parameters and simulated data. We use this
equivalence to reinterpret existing approaches for amortized inference and
propose two new methods that rely on lower bounds of the mutual information. We
apply our framework to the inference of parameters of stochastic processes and
chaotic dynamical systems from sampled trajectories, using artificial neural
networks for posterior prediction. Our approach provides a unified framework
that leverages the power of mutual information estimators for inference
- âŠ