The efficient recognition of pathogens by the adaptive immune system relies
on the diversity of receptors displayed at the surface of immune cells. T-cell
receptor diversity results from an initial random DNA editing process, called
VDJ recombination, followed by functional selection of cells according to the
interaction of their surface receptors with self and foreign antigenic
peptides. To quantify the effect of selection on the highly variable elements
of the receptor, we apply a probabilistic maximum likelihood approach to the
analysis of high-throughput sequence data from the β-chain of human
T-cell receptors. We quantify selection factors for V and J gene choice, and
for the length and amino-acid composition of the variable region. Our approach
is necessary to disentangle the effects of selection from biases inherent in
the recombination process. Inferred selection factors differ little between
donors, or between naive and memory repertoires. The number of sequences shared
between donors is well-predicted by the model, indicating a purely stochastic
origin of such "public" sequences. We find a significant correlation between
biases induced by VDJ recombination and our inferred selection factors,
together with a reduction of diversity during selection. Both effects suggest
that natural selection acting on the recombination process has anticipated the
selection pressures experienced during somatic evolution