143 research outputs found

    On statistics, computation and scalability

    Full text link
    How should statistical procedures be designed so as to be scalable computationally to the massive datasets that are increasingly the norm? When coupled with the requirement that an answer to an inferential question be delivered within a certain time budget, this question has significant repercussions for the field of statistics. With the goal of identifying "time-data tradeoffs," we investigate some of the statistical consequences of computational perspectives on scability, in particular divide-and-conquer methodology and hierarchies of convex relaxations.Comment: Published in at http://dx.doi.org/10.3150/12-BEJSP17 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Budget Feasible Mechanisms for Experimental Design

    Full text link
    In the classical experimental design setting, an experimenter E has access to a population of nn potential experiment subjects i{1,...,n}i\in \{1,...,n\}, each associated with a vector of features xiRdx_i\in R^d. Conducting an experiment with subject ii reveals an unknown value yiRy_i\in R to E. E typically assumes some hypothetical relationship between xix_i's and yiy_i's, e.g., yiβxiy_i \approx \beta x_i, and estimates β\beta from experiments, e.g., through linear regression. As a proxy for various practical constraints, E may select only a subset of subjects on which to conduct the experiment. We initiate the study of budgeted mechanisms for experimental design. In this setting, E has a budget BB. Each subject ii declares an associated cost ci>0c_i >0 to be part of the experiment, and must be paid at least her cost. In particular, the Experimental Design Problem (EDP) is to find a set SS of subjects for the experiment that maximizes V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i}) under the constraint iSciB\sum_{i\in S}c_i\leq B; our objective function corresponds to the information gain in parameter β\beta that is learned through linear regression methods, and is related to the so-called DD-optimality criterion. Further, the subjects are strategic and may lie about their costs. We present a deterministic, polynomial time, budget feasible mechanism scheme, that is approximately truthful and yields a constant factor approximation to EDP. In particular, for any small δ>0\delta > 0 and ϵ>0\epsilon > 0, we can construct a (12.98, ϵ\epsilon)-approximate mechanism that is δ\delta-truthful and runs in polynomial time in both nn and loglogBϵδ\log\log\frac{B}{\epsilon\delta}. We also establish that no truthful, budget-feasible algorithms is possible within a factor 2 approximation, and show how to generalize our approach to a wide class of learning problems, beyond linear regression
    corecore