89 research outputs found

    Budget Feasible Mechanisms for Experimental Design

    Full text link
    In the classical experimental design setting, an experimenter E has access to a population of nn potential experiment subjects i{1,...,n}i\in \{1,...,n\}, each associated with a vector of features xiRdx_i\in R^d. Conducting an experiment with subject ii reveals an unknown value yiRy_i\in R to E. E typically assumes some hypothetical relationship between xix_i's and yiy_i's, e.g., yiβxiy_i \approx \beta x_i, and estimates β\beta from experiments, e.g., through linear regression. As a proxy for various practical constraints, E may select only a subset of subjects on which to conduct the experiment. We initiate the study of budgeted mechanisms for experimental design. In this setting, E has a budget BB. Each subject ii declares an associated cost ci>0c_i >0 to be part of the experiment, and must be paid at least her cost. In particular, the Experimental Design Problem (EDP) is to find a set SS of subjects for the experiment that maximizes V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i}) under the constraint iSciB\sum_{i\in S}c_i\leq B; our objective function corresponds to the information gain in parameter β\beta that is learned through linear regression methods, and is related to the so-called DD-optimality criterion. Further, the subjects are strategic and may lie about their costs. We present a deterministic, polynomial time, budget feasible mechanism scheme, that is approximately truthful and yields a constant factor approximation to EDP. In particular, for any small δ>0\delta > 0 and ϵ>0\epsilon > 0, we can construct a (12.98, ϵ\epsilon)-approximate mechanism that is δ\delta-truthful and runs in polynomial time in both nn and loglogBϵδ\log\log\frac{B}{\epsilon\delta}. We also establish that no truthful, budget-feasible algorithms is possible within a factor 2 approximation, and show how to generalize our approach to a wide class of learning problems, beyond linear regression

    Online Convex Optimization with Binary Constraints

    Full text link
    We consider online optimization with binary decision variables and convex loss functions. We design a new algorithm, binary online gradient descent (bOGD) and bound its expected dynamic regret. We provide a regret bound that holds for any time horizon and a specialized bound for finite time horizons. First, we present the regret as the sum of the relaxed, continuous round optimum tracking error and the rounding error of our update in which the former asymptomatically decreases with time under certain conditions. Then, we derive a finite-time bound that is sublinear in time and linear in the cumulative variation of the relaxed, continuous round optima. We apply bOGD to demand response with thermostatically controlled loads, in which binary constraints model discrete on/off settings. We also model uncertainty and varying load availability, which depend on temperature deadbands, lockout of cooling units and manual overrides. We test the performance of bOGD in several simulations based on demand response. The simulations corroborate that the use of randomization in bOGD does not significantly degrade performance while making the problem more tractable

    Learning in the Real World: Constraints on Cost, Space, and Privacy

    Get PDF
    The sheer demand for machine learning in fields as varied as: healthcare, web-search ranking, factory automation, collision prediction, spam filtering, and many others, frequently outpaces the intended use-case of machine learning models. In fact, a growing number of companies hire machine learning researchers to rectify this very problem: to tailor and/or design new state-of-the-art models to the setting at hand. However, we can generalize a large set of the machine learning problems encountered in practical settings into three categories: cost, space, and privacy. The first category (cost) considers problems that need to balance the accuracy of a machine learning model with the cost required to evaluate it. These include problems in web-search, where results need to be delivered to a user in under a second and be as accurate as possible. The second category (space) collects problems that require running machine learning algorithms on low-memory computing devices. For instance, in search-and-rescue operations we may opt to use many small unmanned aerial vehicles (UAVs) equipped with machine learning algorithms for object detection to find a desired search target. These algorithms should be small to fit within the physical memory limits of the UAV (and be energy efficient) while reliably detecting objects. The third category (privacy) considers problems where one wishes to run machine learning algorithms on sensitive data. It has been shown that seemingly innocuous analyses on such data can be exploited to reveal data individuals would prefer to keep private. Thus, nearly any algorithm that runs on patient or economic data falls under this set of problems. We devise solutions for each of these problem categories including (i) a fast tree-based model for explicitly trading off accuracy and model evaluation time, (ii) a compression method for the k-nearest neighbor classifier, and (iii) a private causal inference algorithm that protects sensitive data

    Privacy Tradeoffs in Predictive Analytics

    Full text link
    Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
    corecore