Machine learning systems are increasingly being used to make impactful
decisions such as loan applications and criminal justice risk assessments, and
as such, ensuring fairness of these systems is critical. This is often
challenging as the labels in the data are biased. This paper studies learning
fair probability distributions from biased data by explicitly modeling a latent
variable that represents a hidden, unbiased label. In particular, we aim to
achieve demographic parity by enforcing certain independencies in the learned
model. We also show that group fairness guarantees are meaningful only if the
distribution used to provide those guarantees indeed captures the real-world
data. In order to closely model the data distribution, we employ probabilistic
circuits, an expressive and tractable probabilistic model, and propose an
algorithm to learn them from incomplete data. We evaluate our approach on a
synthetic dataset in which observed labels indeed come from fair labels but
with added bias, and demonstrate that the fair labels are successfully
retrieved. Moreover, we show on real-world datasets that our approach not only
is a better model than existing methods of how the data was generated but also
achieves competitive accuracy