52 research outputs found
HappyMap : A Generalized Multicalibration Method
Multicalibration is a powerful and evolving concept originating in the field of algorithmic fairness. For a predictor f that estimates the outcome y given covariates x, and for a function class C, multi-calibration requires that the predictor f(x) and outcome y are indistinguishable under the class of auditors in C. Fairness is captured by incorporating demographic subgroups into the class of functions C. Recent work has shown that, by enriching the class C to incorporate appropriate propensity re-weighting functions, multi-calibration also yields target-independent learning, wherein a model trained on a source domain performs well on unseen, future, target domains {(approximately) captured by the re-weightings.}
Formally, multicalibration with respect to C bounds |?_{(x,y)?D}[c(f(x),x)?(f(x)-y)]| for all c ? C. In this work, we view the term (f(x)-y) as just one specific mapping, and explore the power of an enriched class of mappings. We propose s-Happy Multicalibration, a generalization of multi-calibration, which yields a wide range of new applications, including a new fairness notion for uncertainty quantification, a novel technique for conformal prediction under covariate shift, and a different approach to analyzing missing data, while also yielding a unified understanding of several existing seemingly disparate algorithmic fairness notions and target-independent learning approaches.
We give a single HappyMap meta-algorithm that captures all these results, together with a sufficiency condition for its success
When and How Mixup Improves Calibration
In many machine learning applications, it is important for the model to
provide confidence scores that accurately capture its prediction uncertainty.
Although modern learning methods have achieved great success in predictive
accuracy, generating calibrated confidence scores remains a major challenge.
Mixup, a popular yet simple data augmentation technique based on taking convex
combinations of pairs of training examples, has been empirically found to
significantly improve confidence calibration across diverse applications.
However, when and how Mixup helps calibration is still a mystery. In this
paper, we theoretically prove that Mixup improves calibration in
\textit{high-dimensional} settings by investigating natural statistical models.
Interestingly, the calibration benefit of Mixup increases as the model capacity
increases. We support our theories with experiments on common architectures and
datasets. In addition, we study how Mixup improves calibration in
semi-supervised learning. While incorporating unlabeled data can sometimes make
the model less calibrated, adding Mixup training mitigates this issue and
provably improves calibration. Our analysis provides new insights and a
framework to understand Mixup and calibration
How Does Information Bottleneck Help Deep Learning?
Numerous deep learning algorithms have been inspired by and understood via
the notion of information bottleneck, where unnecessary information is (often
implicitly) minimized while task-relevant information is maximized. However, a
rigorous argument for justifying why it is desirable to control information
bottlenecks has been elusive. In this paper, we provide the first rigorous
learning theory for justifying the benefit of information bottleneck in deep
learning by mathematically relating information bottleneck to generalization
errors. Our theory proves that controlling information bottleneck is one way to
control generalization errors in deep learning, although it is not the only or
necessary way. We investigate the merit of our new mathematical findings with
experiments across a range of architectures and learning settings. In many
cases, generalization errors are shown to correlate with the degree of
information bottleneck: i.e., the amount of the unnecessary information at
hidden layers. This paper provides a theoretical foundation for current and
future methods through the lens of information bottleneck. Our new
generalization bounds scale with the degree of information bottleneck, unlike
the previous bounds that scale with the number of parameters, VC dimension,
Rademacher complexity, stability or robustness. Our code is publicly available
at: https://github.com/xu-ji/information-bottleneckComment: Accepted at ICML 2023. Code is available at
https://github.com/xu-ji/information-bottlenec
Decision-Aware Conditional GANs for Time Series Data
We introduce the decision-aware time-series conditional generative
adversarial network (DAT-CGAN) as a method for time-series generation. The
framework adopts a multi-Wasserstein loss on structured decision-related
quantities, capturing the heterogeneity of decision-related data and providing
new effectiveness in supporting the decision processes of end users. We improve
sample efficiency through an overlapped block-sampling method, and provide a
theoretical characterization of the generalization properties of DAT-CGAN. The
framework is demonstrated on financial time series for a multi-time-step
portfolio choice problem. We demonstrate better generative quality in regard to
underlying data and different decision-related quantities than strong,
GAN-based baselines
- …