Search CORE

5 research outputs found

Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts

Author: Akbarian Pedram
Ho Nhat
Nguyen Huy
Yan Fanqi
Publication venue
Publication date: 24/09/2023
Field of study

Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive deep-learning architectures without increasing the computational cost. Despite its popularity in real-world applications, the theoretical understanding of that gating function has remained an open problem. The main challenge comes from the structure of the top-K sparse softmax gating function, which partitions the input space into multiple regions with distinct behaviors. By focusing on a Gaussian mixture of experts, we establish theoretical results on the effects of the top-K sparse softmax gating function on both density and parameter estimations. Our results hinge upon defining novel loss functions among parameters to capture different behaviors of the input regions. When the true number of experts

k_{\ast}

is known, we demonstrate that the convergence rates of density and parameter estimations are both parametric on the sample size. However, when

k_{\ast}

becomes unknown and the true model is over-specified by a Gaussian mixture of

k

experts where

k > k_{\ast}

, our findings suggest that the number of experts selected from the top-K sparse softmax gating function must exceed the total cardinality of a certain number of Voronoi cells associated with the true parameters to guarantee the convergence of the density estimation. Moreover, while the density estimation rate remains parametric under this setting, the parameter estimation rates become substantially slow due to an intrinsic interaction between the softmax gating and expert functions.Comment: 35 page

arXiv.org e-Print Archive

A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

Author: Akbarian Pedram
Ho Nhat
Nguyen Huy
Nguyen TrungTin
Publication venue
Publication date: 22/10/2023
Field of study

Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input value before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.Comment: 36 page

arXiv.org e-Print Archive

A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

Author: Akbarian Pedram
Ho Nhat
Nguyen Huy
Nguyen Trungtin
Publication venue: HAL CCSD
Publication date: 01/01/2023
Field of study

36 pagesMixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input value before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved

INRIA a CCSD electronic archive server

A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

Author: Akbarian Pedram
Ho Nhat
Nguyen Huy
Nguyen Trungtin
Publication venue: HAL CCSD
Publication date: 01/01/2023
Field of study

Hal - Université Grenoble Alpes

A multi-objective fuzzy robust stochastic model for designing a sustainable-resilient-responsive supply chain network

Author: Akbarian-Saravi
Azaron
Benítez-Fernández
Biuki
Cardoso
Carvalho
Chalmardi
Chang
Charnes
Chopra
Dadmand
Dubey
Fahimnia
Farrokh
Fathollahi-Fard
Fathollahi-Fard
Fattahi
Fattahi
Fattahi
Fazli-Khalaf
Ghelichi
Gholizadeh
Goldbeck
Govindan
Govindan
Hasani
Hosseini-Motlagh
Jabbarzadeh
Jadidi
Jouzdani
Kaur
Kazemian
Kim
Kogler
Mamashli
Mamashli
Martí
Mohammaddust
Mohammed
Mohseni
Namdar
Nasiri
Nayeri
Nayeri
Paksoy
Paydar
Pedram
Pishvaee
Pishvaee
Pourmehdi
Rabbani
Ramezankhani
Raut
Rezaei
Rezapour
Roh
Sazvar
Sazvar
Soleimani
Tirkolaee
Tofighi
Torabi
Torabi
Torabi 1
Tsao
Uria
Vafaei
Valderrama
Yavari
Yu
Zahiri
Zhalechian
Zhang
Zhen
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref