We study the problem of data-driven background estimation, arising in the
search of physics signals predicted by the Standard Model at the Large Hadron
Collider. Our work is motivated by the search for the production of pairs of
Higgs bosons decaying into four bottom quarks. A number of other physical
processes, known as background, also share the same final state. The data
arising in this problem is therefore a mixture of unlabeled background and
signal events, and the primary aim of the analysis is to determine whether the
proportion of unlabeled signal events is nonzero. A challenging but necessary
first step is to estimate the distribution of background events. Past work in
this area has determined regions of the space of collider events where signal
is unlikely to appear, and where the background distribution is therefore
identifiable. The background distribution can be estimated in these regions,
and extrapolated into the region of primary interest using transfer learning of
a multivariate classifier. We build upon this existing approach in two ways. On
the one hand, we revisit this method by developing a powerful new classifier
architecture tailored to collider data. On the other hand, we develop a new
method for background estimation, based on the optimal transport problem, which
relies on distinct modeling assumptions. These two methods can serve as
powerful cross-checks for each other in particle physics analyses, due to the
complementarity of their underlying assumptions. We compare their performance
on simulated collider data