68,935 research outputs found
A General Framework for Fair Regression
Fairness, through its many forms and definitions, has become an important
issue facing the machine learning community. In this work, we consider how to
incorporate group fairness constraints in kernel regression methods, applicable
to Gaussian processes, support vector machines, neural network regression and
decision tree regression. Further, we focus on examining the effect of
incorporating these constraints in decision tree regression, with direct
applications to random forests and boosted trees amongst other widespread
popular inference techniques. We show that the order of complexity of memory
and computation is preserved for such models and tightly bound the expected
perturbations to the model in terms of the number of leaves of the trees.
Importantly, the approach works on trained models and hence can be easily
applied to models in current use and group labels are only required on training
data.Comment: 8 pages, 4 figures, 2 pages reference
Interpolation of nonstationary high frequency spatial-temporal temperature data
The Atmospheric Radiation Measurement program is a U.S. Department of Energy
project that collects meteorological observations at several locations around
the world in order to study how weather processes affect global climate change.
As one of its initiatives, it operates a set of fixed but irregularly-spaced
monitoring facilities in the Southern Great Plains region of the U.S. We
describe methods for interpolating temperature records from these fixed
facilities to locations at which no observations were made, which can be useful
when values are required on a spatial grid. We interpolate by conditionally
simulating from a fitted nonstationary Gaussian process model that accounts for
the time-varying statistical characteristics of the temperatures, as well as
the dependence on solar radiation. The model is fit by maximizing an
approximate likelihood, and the conditional simulations result in
well-calibrated confidence intervals for the predicted temperatures. We also
describe methods for handling spatial-temporal jumps in the data to interpolate
a slow-moving cold front.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS633 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Reduced-dimension linear transform coding of distributed correlated signals with incomplete observations
We study the problem of optimal reduced-dimension linear transform coding and reconstruction of a signal based on distributed correlated observations of the signal. In the mean square estimation context this involves finding he optimal signal representation based on multiple incomplete or only partial observations that are correlated. In particular this leads to the study of finding the optimal Karhunen-Loeve basis based on the censored observations. The problem has been considered previously by Gestpar, Dragotti and Vitterli in the context of jointly Gaussian random variables based on using conditional covariances. In this paper, we derive the estimation results in the more general setting of second-order random variables with arbitrary distributions, using entirely different techniques based on the idea of innovations. We explicitly solve the single transform coder case, give a characterization of optimality in the multiple distributed transform coders scenario and provide additional insights into the structure of the problm
Generative models versus underlying symmetries to explain biological pattern
Mathematical models play an increasingly important role in the interpretation
of biological experiments. Studies often present a model that generates the
observations, connecting hypothesized process to an observed pattern. Such
generative models confirm the plausibility of an explanation and make testable
hypotheses for further experiments. However, studies rarely consider the broad
family of alternative models that match the same observed pattern. The
symmetries that define the broad class of matching models are in fact the only
aspects of information truly revealed by observed pattern. Commonly observed
patterns derive from simple underlying symmetries. This article illustrates the
problem by showing the symmetry associated with the observed rate of increase
in fitness in a constant environment. That underlying symmetry reveals how each
particular generative model defines a single example within the broad class of
matching models. Further progress on the relation between pattern and process
requires deeper consideration of the underlying symmetries
- …