Search CORE

13 research outputs found

Sparse Continuous Distributions and Fenchel-Young Losses

Author: Aguiar Pedro M. Q.
Blondel Mathieu
Farinhas António
Figueiredo Mário A. T.
Martins André F. T.
Niculae Vlad
Treviso Marcos
Publication venue
Publication date: 04/08/2022
Field of study

Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax,

\alpha

-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define

\Omega

-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When

\Omega

is a Tsallis negentropy with parameter

\alpha

, we obtain ``deformed exponential families,'' which include

\alpha

-entmax and sparsemax (

\alpha=2

) as particular cases. For quadratic energy functions, the resulting densities are

\beta

-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When

\Omega

is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for

\alpha \in \{1, 4/3, 3/2, 2\}

. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.Comment: JMLR 2022 camera ready version. arXiv admin note: text overlap with arXiv:2006.0721

arXiv.org e-Print Archive

Efficient Methods for Natural Language Processing: A Survey

Author: Balasubramanian Niranjan
Cao Qingqing
Ciosici Manuel R.
Derczynski Leon
Dodge Jesse
Forde Jessica Zosa
Gurevych Iryna
Hassid Michael
Heafield Kenneth
Hooker Sara
Ji Tianchu
Lee Ji-Ung
Martins André F. T.
Martins Pedro H.
Milder Peter
Raffel Colin
Schwartz Roy
Simpson Edwin
Slonim Noam
Strubell Emma
Treviso Marcos
van Aken Betty
Publication venue
Publication date: 01/01/2023
Field of study

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.Comment: Accepted at TACL, pre publication versio

arXiv.org e-Print Archive

TUbiblio

Explore Bristol Research