We investigate penalized maximum log-likelihood estimation for exponential
family distributions whose natural parameter resides in a reproducing kernel
Hilbert space. Key to our approach is a novel technique, doubly dual embedding,
that avoids computation of the partition function. This technique also allows
the development of a flexible sampling strategy that amortizes the cost of
Monte-Carlo sampling in the inference stage. The resulting estimator can be
easily generalized to kernel conditional exponential families. We establish a
connection between kernel exponential family estimation and MMD-GANs, revealing
a new perspective for understanding GANs. Compared to the score matching based
estimators, the proposed method improves both memory and time efficiency while
enjoying stronger statistical properties, such as fully capturing smoothness in
its statistical convergence rate while the score matching estimator appears to
saturate. Finally, we show that the proposed estimator empirically outperforms
state-of-the-artComment: 22 pages, 20 figures; AISTATS 201