13 research outputs found
Sparse Continuous Distributions and Fenchel-Young Losses
Exponential families are widely used in machine learning, including many
distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet,
Poisson, and categorical distributions via the softmax transformation).
Distributions in each of these families have fixed support. In contrast, for
finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax,
-entmax, and fusedmax), has led to distributions with varying support.
This paper develops sparse alternatives to continuous distributions, based on
several technical contributions: First, we define -regularized
prediction maps and Fenchel-Young losses for arbitrary domains (possibly
countably infinite or continuous). For linearly parametrized families, we show
that minimization of Fenchel-Young losses is equivalent to moment matching of
the statistics, generalizing a fundamental property of exponential families.
When is a Tsallis negentropy with parameter , we obtain
``deformed exponential families,'' which include -entmax and sparsemax
() as particular cases. For quadratic energy functions, the resulting
densities are -Gaussians, an instance of elliptical distributions that
contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov
densities, and for which we derive closed-form expressions for the variance,
Tsallis entropy, and Fenchel-Young loss. When is a total variation or
Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally,
we introduce continuous-domain attention mechanisms, deriving efficient
gradient backpropagation algorithms for . Using
these algorithms, we demonstrate our sparse continuous distributions for
attention-based audio classification and visual question answering, showing
that they allow attending to time intervals and compact regions.Comment: JMLR 2022 camera ready version. arXiv admin note: text overlap with
arXiv:2006.0721
Efficient Methods for Natural Language Processing: A Survey
Recent work in natural language processing (NLP) has yielded appealing
results from scaling model parameters and training data; however, using only
scale to improve performance means that resource consumption also grows. Such
resources include data, time, storage, or energy, all of which are naturally
limited and unevenly distributed. This motivates research into efficient
methods that require fewer resources to achieve similar results. This survey
synthesizes and relates current methods and findings in efficient NLP. We aim
to provide both guidance for conducting NLP under limited resources, and point
towards promising research directions for developing more efficient methods.Comment: Accepted at TACL, pre publication versio