21 research outputs found
Incremental Medians via Online Bidding
In the k-median problem we are given sets of facilities and customers, and
distances between them. For a given set F of facilities, the cost of serving a
customer u is the minimum distance between u and a facility in F. The goal is
to find a set F of k facilities that minimizes the sum, over all customers, of
their service costs.
Following Mettu and Plaxton, we study the incremental medians problem, where
k is not known in advance, and the algorithm produces a nested sequence of
facility sets where the kth set has size k. The algorithm is c-cost-competitive
if the cost of each set is at most c times the cost of the optimum set of size
k. We give improved incremental algorithms for the metric version: an
8-cost-competitive deterministic algorithm, a 2e ~ 5.44-cost-competitive
randomized algorithm, a (24+epsilon)-cost-competitive, poly-time deterministic
algorithm, and a (6e+epsilon ~ .31)-cost-competitive, poly-time randomized
algorithm.
The algorithm is s-size-competitive if the cost of the kth set is at most the
minimum cost of any set of size k, and has size at most s k. The optimal
size-competitive ratios for this problem are 4 (deterministic) and e
(randomized). We present the first poly-time O(log m)-size-approximation
algorithm for the offline problem and first poly-time O(log m)-size-competitive
algorithm for the incremental problem.
Our proofs reduce incremental medians to the following online bidding
problem: faced with an unknown threshold T, an algorithm submits "bids" until
it submits a bid that is at least the threshold. It pays the sum of all its
bids. We prove that folklore algorithms for online bidding are optimally
competitive.Comment: conference version appeared in LATIN 2006 as "Oblivious Medians via
Online Bidding
Some Statistical Models for Prediction
This dissertation examines the use of statistical models for prediction. Examples are drawn from public policy and chosen because they represent pressing problems facing U.S. governments at the local, state, and federal level. The first five chapters provide examples where the perfunctory use of linear models, the prediction tool of choice in government, failed to produce reasonable predictions. Methodological flaws are identified, and more accurate models are proposed that draw on advances in statistics, data science, and machine learning. Chapter 1 examines skyscraper construction, where the normality assumption is violated and extreme value analysis is more appropriate. Chapters 2 and 3 examine presidential approval and voting (a leading measure of civic participation), where the non-collinearity assumption is violated and an index model is more appropriate. Chapter 4 examines changes in temperature sensitivity due to global warming, where the linearity assumption is violated and a first-hitting-time model is more appropriate. Chapter 5 examines the crime rate, where the independence assumption is violated and a block model is more appropriate. The last chapter provides an example where simple linear regression was overlooked as providing a sensible solution. Chapter 6 examines traffic fatalities, where the linear assumption provides a better predictor than the more popular non-linear probability model, logistic regression. A theoretical connection is established between the linear probability model, the influence score, and the predictivity