We consider the problem of learning a model from multiple heterogeneous
sources with the goal of performing well on a new target distribution. The goal
of learner is to mix these data sources in a target-distribution aware way and
simultaneously minimize the empirical risk on the mixed source. The literature
has made some tangible advancements in establishing theory of learning on
mixture domain. However, there are still two unsolved problems. Firstly, how to
estimate the optimal mixture of sources, given a target domain; Secondly, when
there are numerous target domains, how to solve empirical risk minimization
(ERM) for each target using possibly unique mixture of data sources in a
computationally efficient manner. In this paper we address both problems
efficiently and with guarantees. We cast the first problem, mixture weight
estimation, as a convex-nonconcave compositional minimax problem, and propose
an efficient stochastic algorithm with provable stationarity guarantees. Next,
for the second problem, we identify that for certain regimes, solving ERM for
each target domain individually can be avoided, and instead parameters for a
target optimal model can be viewed as a non-linear function on a space of the
mixture coefficients. Building upon this, we show that in the offline setting,
a GD-trained overparameterized neural network can provably learn such function
to predict the model of target domain instead of solving a designated ERM
problem. Finally, we also consider an online setting and propose a label
efficient online algorithm, which predicts parameters for new targets given an
arbitrary sequence of mixing coefficients, while enjoying regret guarantees