12,800 research outputs found
On the Bayes-optimality of F-measure maximizers
The F-measure, which has originally been introduced in information retrieval,
is nowadays routinely used as a performance metric for problems such as binary
classification, multi-label classification, and structured output prediction.
Optimizing this measure is a statistically and computationally challenging
problem, since no closed-form solution exists. Adopting a decision-theoretic
perspective, this article provides a formal and experimental analysis of
different approaches for maximizing the F-measure. We start with a Bayes-risk
analysis of related loss functions, such as Hamming loss and subset zero-one
loss, showing that optimizing such losses as a surrogate of the F-measure leads
to a high worst-case regret. Subsequently, we perform a similar type of
analysis for F-measure maximizing algorithms, showing that such algorithms are
approximate, while relying on additional assumptions regarding the statistical
distribution of the binary response variables. Furthermore, we present a new
algorithm which is not only computationally efficient but also Bayes-optimal,
regardless of the underlying distribution. To this end, the algorithm requires
only a quadratic (with respect to the number of binary responses) number of
parameters of the joint distribution. We illustrate the practical performance
of all analyzed methods by means of experiments with multi-label classification
problems
The Query-commit Problem
In the query-commit problem we are given a graph where edges have distinct
probabilities of existing. It is possible to query the edges of the graph, and
if the queried edge exists then its endpoints are irrevocably matched. The goal
is to find a querying strategy which maximizes the expected size of the
matching obtained. This stochastic matching setup is motivated by applications
in kidney exchanges and online dating.
In this paper we address the query-commit problem from both theoretical and
experimental perspectives. First, we show that a simple class of edges can be
queried without compromising the optimality of the strategy. This property is
then used to obtain in polynomial time an optimal querying strategy when the
input graph is sparse. Next we turn our attentions to the kidney exchange
application, focusing on instances modeled over real data from existing
exchange programs. We prove that, as the number of nodes grows, almost every
instance admits a strategy which matches almost all nodes. This result supports
the intuition that more exchanges are possible on a larger pool of
patient/donors and gives theoretical justification for unifying the existing
exchange programs. Finally, we evaluate experimentally different querying
strategies over kidney exchange instances. We show that even very simple
heuristics perform fairly well, being within 1.5% of an optimal clairvoyant
strategy, that knows in advance the edges in the graph. In such a
time-sensitive application, this result motivates the use of committing
strategies
Distributed Detection over Random Networks: Large Deviations Performance Analysis
We study the large deviations performance, i.e., the exponential decay rate
of the error probability, of distributed detection algorithms over random
networks. At each time step each sensor: 1) averages its decision variable
with the neighbors' decision variables; and 2) accounts on-the-fly for its new
observation. We show that distributed detection exhibits a "phase change"
behavior. When the rate of network information flow (the speed of averaging) is
above a threshold, then distributed detection is asymptotically equivalent to
the optimal centralized detection, i.e., the exponential decay rate of the
error probability for distributed detection equals the Chernoff information.
When the rate of information flow is below a threshold, distributed detection
achieves only a fraction of the Chernoff information rate; we quantify this
achievable rate as a function of the network rate of information flow.
Simulation examples demonstrate our theoretical findings on the behavior of
distributed detection over random networks.Comment: 30 pages, journal, submitted on December 3rd, 201
- …