12,560 research outputs found
Notes on Information-Theoretic Privacy
We investigate the tradeoff between privacy and utility in a situation where
both privacy and utility are measured in terms of mutual information. For the
binary case, we fully characterize this tradeoff in case of perfect privacy and
also give an upper-bound for the case where some privacy leakage is allowed. We
then introduce a new quantity which quantifies the amount of private
information contained in the observable data and then connect it to the optimal
tradeoff between privacy and utility.Comment: The corrected version of a paper appeared in Allerton 201
Exponential and power laws in public procurement markets
For the first time ever, we analyze a unique public procurement database,
which includes information about a number of bidders for a contract, a final
price, an identification of a winner and an identification of a contracting
authority for each of more than 40,000 public procurements in the Czech
Republic between 2006 and 2011, focusing on the distributional properties of
the variables of interest. We uncover several scaling laws -- the exponential
law for the number of bidders, and the power laws for the total revenues and
total spendings of the participating companies, which even follows the Zipf's
law for the 100 most spending institutions. We propose an analogy between
extensive and non-extensive systems in physics and the public procurement
market situations. Through an entropy maximization, such the analogy yields
some interesting results and policy implications with respect to the
Maxwell-Boltzmann and Pareto distributions in the analyzed quantities.Comment: 6 pages, 3 figure
Statistical inference of the generation probability of T-cell receptors from sequence repertoires
Stochastic rearrangement of germline DNA by VDJ recombination is at the
origin of immune system diversity. This process is implemented via a series of
stochastic molecular events involving gene choices and random nucleotide
insertions between, and deletions from, genes. We use large sequence
repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta
chains to infer the statistical properties of these basic biochemical events.
Since any given CDR3 sequence can be produced in multiple ways, the probability
distribution of hidden recombination events cannot be inferred directly from
the observed sequences; we therefore develop a maximum likelihood inference
method to achieve this end. To separate the properties of the molecular
rearrangement mechanism from the effects of selection, we focus on
non-productive CDR3 sequences in T-cell DNA. We infer the joint distribution of
the various generative events that occur when a new T-cell receptor gene is
created. We find a rich picture of correlation (and absence thereof), providing
insight into the molecular mechanisms involved. The generative event statistics
are consistent between individuals, suggesting a universal biochemical process.
Our distribution predicts the generation probability of any specific CDR3
sequence by the primitive recombination process, allowing us to quantify the
potential diversity of the T-cell repertoire and to understand why some
sequences are shared between individuals. We argue that the use of formal
statistical inference methods, of the kind presented in this paper, will be
essential for quantitative understanding of the generation and evolution of
diversity in the adaptive immune system.Comment: 20 pages, including Appendi
- …