23 research outputs found
Confidence Intervals for Unobserved Events
Consider a finite sample from an unknown distribution over a countable
alphabet. Unobserved events are alphabet symbols which do not appear in the
sample. Estimating the probabilities of unobserved events is a basic problem in
statistics and related fields, which was extensively studied in the context of
point estimation. In this work we introduce a novel interval estimation scheme
for unobserved events. Our proposed framework applies selective inference, as
we construct confidence intervals (CIs) for the desired set of parameters.
Interestingly, we show that obtained CIs are dimension-free, as they do not
grow with the alphabet size. Further, we show that these CIs are (almost)
tight, in the sense that they cannot be further improved without violating the
prescribed coverage rate. We demonstrate the performance of our proposed scheme
in synthetic and real-world experiments, showing a significant improvement over
the alternatives. Finally, we apply our proposed scheme to large alphabet
modeling. We introduce a novel simultaneous CI scheme for large alphabet
distributions which outperforms currently known methods while maintaining the
prescribed coverage rate
On the Universality of the Logistic Loss Function
A loss function measures the discrepancy between the true values
(observations) and their estimated fits, for a given instance of data. A loss
function is said to be proper (unbiased, Fisher consistent) if the fits are
defined over a unit simplex, and the minimizer of the expected loss is the true
underlying probability of the data. Typical examples are the zero-one loss, the
quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work
we show that for binary classification problems, the divergence associated with
smooth, proper and convex loss functions is bounded from above by the
Kullback-Leibler (KL) divergence, up to a multiplicative normalization
constant. It implies that by minimizing the log-loss (associated with the KL
divergence), we minimize an upper bound to any choice of loss functions from
this set. This property justifies the broad use of log-loss in regression,
decision trees, deep neural networks and many other applications. In addition,
we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a multiplicative
normalization constant). This result introduces a new set of divergence
inequalities, similar to the well-known Pinsker inequality