89 research outputs found
Budget Feasible Mechanisms for Experimental Design
In the classical experimental design setting, an experimenter E has access to
a population of potential experiment subjects , each
associated with a vector of features . Conducting an experiment
with subject reveals an unknown value to E. E typically assumes
some hypothetical relationship between 's and 's, e.g., , and estimates from experiments, e.g., through linear
regression. As a proxy for various practical constraints, E may select only a
subset of subjects on which to conduct the experiment.
We initiate the study of budgeted mechanisms for experimental design. In this
setting, E has a budget . Each subject declares an associated cost to be part of the experiment, and must be paid at least her cost. In
particular, the Experimental Design Problem (EDP) is to find a set of
subjects for the experiment that maximizes V(S) = \log\det(I_d+\sum_{i\in
S}x_i\T{x_i}) under the constraint ; our objective
function corresponds to the information gain in parameter that is
learned through linear regression methods, and is related to the so-called
-optimality criterion. Further, the subjects are strategic and may lie about
their costs.
We present a deterministic, polynomial time, budget feasible mechanism
scheme, that is approximately truthful and yields a constant factor
approximation to EDP. In particular, for any small and , we can construct a (12.98, )-approximate mechanism that is
-truthful and runs in polynomial time in both and
. We also establish that no truthful,
budget-feasible algorithms is possible within a factor 2 approximation, and
show how to generalize our approach to a wide class of learning problems,
beyond linear regression
Online Convex Optimization with Binary Constraints
We consider online optimization with binary decision variables and convex
loss functions. We design a new algorithm, binary online gradient descent
(bOGD) and bound its expected dynamic regret. We provide a regret bound that
holds for any time horizon and a specialized bound for finite time horizons.
First, we present the regret as the sum of the relaxed, continuous round
optimum tracking error and the rounding error of our update in which the former
asymptomatically decreases with time under certain conditions. Then, we derive
a finite-time bound that is sublinear in time and linear in the cumulative
variation of the relaxed, continuous round optima. We apply bOGD to demand
response with thermostatically controlled loads, in which binary constraints
model discrete on/off settings. We also model uncertainty and varying load
availability, which depend on temperature deadbands, lockout of cooling units
and manual overrides. We test the performance of bOGD in several simulations
based on demand response. The simulations corroborate that the use of
randomization in bOGD does not significantly degrade performance while making
the problem more tractable
Learning in the Real World: Constraints on Cost, Space, and Privacy
The sheer demand for machine learning in fields as varied as: healthcare, web-search ranking, factory automation, collision prediction, spam filtering, and many others, frequently outpaces the intended use-case of machine learning models. In fact, a growing number of companies hire machine learning researchers to rectify this very problem: to tailor and/or design new state-of-the-art models to the setting at hand.
However, we can generalize a large set of the machine learning problems encountered in practical settings into three categories: cost, space, and privacy. The first category (cost) considers problems that need to balance the accuracy of a machine learning model with the cost required to evaluate it. These include problems in web-search, where results need to be delivered to a user in under a second and be as accurate as possible. The second category (space) collects problems that require running machine learning algorithms on low-memory computing devices. For instance, in search-and-rescue operations we may opt to use many small unmanned aerial vehicles (UAVs) equipped with machine learning algorithms for object detection to find a desired search target. These algorithms should be small to fit within the physical memory limits of the UAV (and be energy efficient) while reliably detecting objects. The third category (privacy) considers problems where one wishes to run machine learning algorithms on sensitive data. It has been shown that seemingly innocuous analyses on such data can be exploited to reveal data individuals would prefer to keep private. Thus, nearly any algorithm that runs on patient or economic data falls under this set of problems.
We devise solutions for each of these problem categories including (i) a fast tree-based model for explicitly trading off accuracy and model evaluation time, (ii) a compression method for the k-nearest neighbor classifier, and (iii) a private causal inference algorithm that protects sensitive data
Privacy Tradeoffs in Predictive Analytics
Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
- …