68,935 research outputs found

    A General Framework for Fair Regression

    Full text link
    Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods, applicable to Gaussian processes, support vector machines, neural network regression and decision tree regression. Further, we focus on examining the effect of incorporating these constraints in decision tree regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that the order of complexity of memory and computation is preserved for such models and tightly bound the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use and group labels are only required on training data.Comment: 8 pages, 4 figures, 2 pages reference

    Interpolation of nonstationary high frequency spatial-temporal temperature data

    Full text link
    The Atmospheric Radiation Measurement program is a U.S. Department of Energy project that collects meteorological observations at several locations around the world in order to study how weather processes affect global climate change. As one of its initiatives, it operates a set of fixed but irregularly-spaced monitoring facilities in the Southern Great Plains region of the U.S. We describe methods for interpolating temperature records from these fixed facilities to locations at which no observations were made, which can be useful when values are required on a spatial grid. We interpolate by conditionally simulating from a fitted nonstationary Gaussian process model that accounts for the time-varying statistical characteristics of the temperatures, as well as the dependence on solar radiation. The model is fit by maximizing an approximate likelihood, and the conditional simulations result in well-calibrated confidence intervals for the predicted temperatures. We also describe methods for handling spatial-temporal jumps in the data to interpolate a slow-moving cold front.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS633 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Reduced-dimension linear transform coding of distributed correlated signals with incomplete observations

    Get PDF
    We study the problem of optimal reduced-dimension linear transform coding and reconstruction of a signal based on distributed correlated observations of the signal. In the mean square estimation context this involves finding he optimal signal representation based on multiple incomplete or only partial observations that are correlated. In particular this leads to the study of finding the optimal Karhunen-Loeve basis based on the censored observations. The problem has been considered previously by Gestpar, Dragotti and Vitterli in the context of jointly Gaussian random variables based on using conditional covariances. In this paper, we derive the estimation results in the more general setting of second-order random variables with arbitrary distributions, using entirely different techniques based on the idea of innovations. We explicitly solve the single transform coder case, give a characterization of optimality in the multiple distributed transform coders scenario and provide additional insights into the structure of the problm

    Generative models versus underlying symmetries to explain biological pattern

    Full text link
    Mathematical models play an increasingly important role in the interpretation of biological experiments. Studies often present a model that generates the observations, connecting hypothesized process to an observed pattern. Such generative models confirm the plausibility of an explanation and make testable hypotheses for further experiments. However, studies rarely consider the broad family of alternative models that match the same observed pattern. The symmetries that define the broad class of matching models are in fact the only aspects of information truly revealed by observed pattern. Commonly observed patterns derive from simple underlying symmetries. This article illustrates the problem by showing the symmetry associated with the observed rate of increase in fitness in a constant environment. That underlying symmetry reveals how each particular generative model defines a single example within the broad class of matching models. Further progress on the relation between pattern and process requires deeper consideration of the underlying symmetries
    corecore