13,485 research outputs found
Estimating Maximally Probable Constrained Relations by Mathematical Programming
Estimating a constrained relation is a fundamental problem in machine
learning. Special cases are classification (the problem of estimating a map
from a set of to-be-classified elements to a set of labels), clustering (the
problem of estimating an equivalence relation on a set) and ranking (the
problem of estimating a linear order on a set). We contribute a family of
probability measures on the set of all relations between two finite, non-empty
sets, which offers a joint abstraction of multi-label classification,
correlation clustering and ranking by linear ordering. Estimating (learning) a
maximally probable measure, given (a training set of) related and unrelated
pairs, is a convex optimization problem. Estimating (inferring) a maximally
probable relation, given a measure, is a 01-linear program. It is solved in
linear time for maps. It is NP-hard for equivalence relations and linear
orders. Practical solutions for all three cases are shown in experiments with
real data. Finally, estimating a maximally probable measure and relation
jointly is posed as a mixed-integer nonlinear program. This formulation
suggests a mathematical programming approach to semi-supervised learning.Comment: 16 page
Scalable Audience Reach Estimation in Real-time Online Advertising
Online advertising has been introduced as one of the most efficient methods
of advertising throughout the recent years. Yet, advertisers are concerned
about the efficiency of their online advertising campaigns and consequently,
would like to restrict their ad impressions to certain websites and/or certain
groups of audience. These restrictions, known as targeting criteria, limit the
reachability for better performance. This trade-off between reachability and
performance illustrates a need for a forecasting system that can quickly
predict/estimate (with good accuracy) this trade-off. Designing such a system
is challenging due to (a) the huge amount of data to process, and, (b) the need
for fast and accurate estimates. In this paper, we propose a distributed fault
tolerant system that can generate such estimates fast with good accuracy. The
main idea is to keep a small representative sample in memory across multiple
machines and formulate the forecasting problem as queries against the sample.
The key challenge is to find the best strata across the past data, perform
multivariate stratified sampling while ensuring fuzzy fall-back to cover the
small minorities. Our results show a significant improvement over the uniform
and simple stratified sampling strategies which are currently widely used in
the industry
- …