5,963 research outputs found
Ubiquity of synonymity: almost all large binary trees are not uniquely identified by their spectra or their immanantal polynomials
There are several common ways to encode a tree as a matrix, such as the
adjacency matrix, the Laplacian matrix (that is, the infinitesimal generator of
the natural random walk), and the matrix of pairwise distances between leaves.
Such representations involve a specific labeling of the vertices or at least
the leaves, and so it is natural to attempt to identify trees by some feature
of the associated matrices that is invariant under relabeling. An obvious
candidate is the spectrum of eigenvalues (or, equivalently, the characteristic
polynomial). We show for any of these choices of matrix that the fraction of
binary trees with a unique spectrum goes to zero as the number of leaves goes
to infinity. We investigate the rate of convergence of the above fraction to
zero using numerical methods. For the adjacency and Laplacian matrices, we show
that that the {\em a priori} more informative immanantal polynomials have no
greater power to distinguish between trees
Predicting B Cell Receptor Substitution Profiles Using Public Repertoire Data
B cells develop high affinity receptors during the course of affinity
maturation, a cyclic process of mutation and selection. At the end of affinity
maturation, a number of cells sharing the same ancestor (i.e. in the same
"clonal family") are released from the germinal center, their amino acid
frequency profile reflects the allowed and disallowed substitutions at each
position. These clonal-family-specific frequency profiles, called "substitution
profiles", are useful for studying the course of affinity maturation as well as
for antibody engineering purposes. However, most often only a single sequence
is recovered from each clonal family in a sequencing experiment, making it
impossible to construct a clonal-family-specific substitution profile. Given
the public release of many high-quality large B cell receptor datasets, one may
ask whether it is possible to use such data in a prediction model for
clonal-family-specific substitution profiles. In this paper, we present the
method "Substitution Profiles Using Related Families" (SPURF), a penalized
tensor regression framework that integrates information from a rich assemblage
of datasets to predict the clonal-family-specific substitution profile for any
single input sequence. Using this framework, we show that substitution profiles
from similar clonal families can be leveraged together with simulated
substitution profiles and germline gene sequence information to improve
prediction. We fit this model on a large public dataset and validate the
robustness of our approach on an external dataset. Furthermore, we provide a
command-line tool in an open-source software package
(https://github.com/krdav/SPURF) implementing these ideas and providing easy
prediction using our pre-fit models.Comment: 23 page
A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis.
The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling
Survival analysis of DNA mutation motifs with penalized proportional hazards
Antibodies, an essential part of our immune system, develop through an
intricate process to bind a wide array of pathogens. This process involves
randomly mutating DNA sequences encoding these antibodies to find variants with
improved binding, though mutations are not distributed uniformly across
sequence sites. Immunologists observe this nonuniformity to be consistent with
"mutation motifs", which are short DNA subsequences that affect how likely a
given site is to experience a mutation. Quantifying the effect of motifs on
mutation rates is challenging: a large number of possible motifs makes this
statistical problem high dimensional, while the unobserved history of the
mutation process leads to a nontrivial missing data problem. We introduce an
-penalized proportional hazards model to infer mutation motifs and
their effects. In order to estimate model parameters, our method uses a Monte
Carlo EM algorithm to marginalize over the unknown ordering of mutations. We
show that our method performs better on simulated data compared to current
methods and leads to more parsimonious models. The application of proportional
hazards to mutation processes is, to our knowledge, novel and formalizes the
current methods in a statistical framework that can be easily extended to
analyze the effect of other biological features on mutation rates
Height and Body Mass on the Mating Market: Associations With Number of Sex Partners and Extra-Pair Sex Among Heterosexual Men and Women Aged 18–65
People with traits that are attractive on the mating market are better able to pursue their preferred mating strategy. Men who are relatively tall may be preferred by women because taller height is a cue to dominance, social status, access to resources, and heritable fitness, leading them to have more mating opportunities and sex partners. We examined height, education, age, ethnicity, and body mass index (BMI) as predictors of sexual history among heterosexual men and women (N = 60,058). The linear and curvilinear associations between self-reported height and sex partner number were small for men when controlling for education, BMI, and ethnicity (linear β = .05; curvilinear β = −.03). The mean and median number of sex partners for men of different heights were: very short (9.4; 5), short (11.0; 7), average (11.7; 7), tall (12.0; 7), very tall (12.1; 7), and extremely tall (12.3; 7). Men who were “overweight” reported a higher mean and median number of sex partners than men with other body masses. The results for men suggested limited variation in reported sex partner number across most of the height continuum, but that very short men report fewer partners than other men
Using genotype abundance to improve phylogenetic inference
Modern biological techniques enable very dense genetic sampling of unfolding
evolutionary histories, and thus frequently sample some genotypes multiple
times. This motivates strategies to incorporate genotype abundance information
in phylogenetic inference. In this paper, we synthesize a stochastic process
model with standard sequence-based phylogenetic optimality, and show that tree
estimation is substantially improved by doing so. Our method is validated with
extensive simulations and an experimental single-cell lineage tracing study of
germinal center B cell receptor affinity maturation
- …