376,596 research outputs found
The Online Median Problem
We introduce a natural variant of the (metric uncapacitated) k-median problem that we call the online median problem. Whereas the k-median problem involves optimizing the simultaneous placement of k facilities, the online median problem imposes the following additional constraints: the facilities are placed one at a time, a facility cannot be moved once it is placed, and the total number of facilities to be placed, k, is not known in advance. The objective of an online median algorithm is to minimize the competitive ratio, that is, the worst-case ratio of the cost of an online placement to that of an optimal offline placement. Our main result is a constant-competitive algorithm for the online median problem running in time that is linear in the input size. In addition, we present a related, though substantially simpler, constant-factor approximation algorithm for the (metric uncapacitated) facility location problem that runs in time linear in the input size. The latter algorithm is similar in spirit to the recent primal-dual-based facility location algorithm of Jain and Vazirani, but our approach is more elementary and yields an improved running time. While our primary focus is on problems which ask us to minimize the weighted average service distance to facilities, we also show that our results can be generalized to hold, to within constant factors, for more general objective functions. For example, we show that all of our approximation results hold, to within constant factors, for the k-means objective function
Incremental Medians via Online Bidding
In the k-median problem we are given sets of facilities and customers, and
distances between them. For a given set F of facilities, the cost of serving a
customer u is the minimum distance between u and a facility in F. The goal is
to find a set F of k facilities that minimizes the sum, over all customers, of
their service costs.
Following Mettu and Plaxton, we study the incremental medians problem, where
k is not known in advance, and the algorithm produces a nested sequence of
facility sets where the kth set has size k. The algorithm is c-cost-competitive
if the cost of each set is at most c times the cost of the optimum set of size
k. We give improved incremental algorithms for the metric version: an
8-cost-competitive deterministic algorithm, a 2e ~ 5.44-cost-competitive
randomized algorithm, a (24+epsilon)-cost-competitive, poly-time deterministic
algorithm, and a (6e+epsilon ~ .31)-cost-competitive, poly-time randomized
algorithm.
The algorithm is s-size-competitive if the cost of the kth set is at most the
minimum cost of any set of size k, and has size at most s k. The optimal
size-competitive ratios for this problem are 4 (deterministic) and e
(randomized). We present the first poly-time O(log m)-size-approximation
algorithm for the offline problem and first poly-time O(log m)-size-competitive
algorithm for the incremental problem.
Our proofs reduce incremental medians to the following online bidding
problem: faced with an unknown threshold T, an algorithm submits "bids" until
it submits a bid that is at least the threshold. It pays the sum of all its
bids. We prove that folklore algorithms for online bidding are optimally
competitive.Comment: conference version appeared in LATIN 2006 as "Oblivious Medians via
Online Bidding
Online -Median with Consistent Clusters
We consider the online -median clustering problem in which points
arrive online and must be irrevocably assigned to a cluster on arrival. As
there are lower bound instances that show that an online algorithm cannot
achieve a competitive ratio that is a function of and , we consider a
beyond worst-case analysis model in which the algorithm is provided a priori
with a predicted budget that upper bounds the optimal objective value. We
give an algorithm that achieves a competitive ratio that is exponential in the
the number of clusters, and show that the competitive ratio of every
algorithm must be linear in . To the best of our knowledge this is the first
investigation in the literature that considers cluster consistency using
competitive analysis.Comment: 28 pages, 7 figure
An Efficient Bandit Algorithm for Realtime Multivariate Optimization
Optimization is commonly employed to determine the content of web pages, such
as to maximize conversions on landing pages or click-through rates on search
engine result pages. Often the layout of these pages can be decoupled into
several separate decisions. For example, the composition of a landing page may
involve deciding which image to show, which wording to use, what color
background to display, etc. Such optimization is a combinatorial problem over
an exponentially large decision space. Randomized experiments do not scale well
to this setting, and therefore, in practice, one is typically limited to
optimizing a single aspect of a web page at a time. This represents a missed
opportunity in both the speed of experimentation and the exploitation of
possible interactions between layout decisions.
Here we focus on multivariate optimization of interactive web pages. We
formulate an approach where the possible interactions between different
components of the page are modeled explicitly. We apply bandit methodology to
explore the layout space efficiently and use hill-climbing to select optimal
content in realtime. Our algorithm also extends to contextualization and
personalization of layout selection. Simulation results show the suitability of
our approach to large decision spaces with strong interactions between content.
We further apply our algorithm to optimize a message that promotes adoption of
an Amazon service. After only a single week of online optimization, we saw a
21% conversion increase compared to the median layout. Our technique is
currently being deployed to optimize content across several locations at
Amazon.com.Comment: KDD'17 Audience Appreciation Awar
How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?
In numerous applicative contexts, data are too rich and too complex to be
represented by numerical vectors. A general approach to extend machine learning
and data mining techniques to such data is to really on a dissimilarity or on a
kernel that measures how different or similar two objects are. This approach
has been used to define several variants of the Self Organizing Map (SOM). This
paper reviews those variants in using a common set of notations in order to
outline differences and similarities between them. It discusses the advantages
and drawbacks of the variants, as well as the actual relevance of the
dissimilarity/kernel SOM for practical applications
- …