118 research outputs found

    A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

    Full text link
    Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input value before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.Comment: 36 page

    Minimax Optimal Rate for Parameter Estimation in Multivariate Deviated Models

    Full text link
    We study the maximum likelihood estimation (MLE) in the multivariate deviated model where the data are generated from the density function (1−λ∗)h0(x)+λ∗f(x∣μ∗,Σ∗)(1-\lambda^{\ast})h_{0}(x)+\lambda^{\ast}f(x|\mu^{\ast}, \Sigma^{\ast}) in which h0h_{0} is a known function, λ∗∈[0,1]\lambda^{\ast} \in [0,1] and (μ∗,Σ∗)(\mu^{\ast}, \Sigma^{\ast}) are unknown parameters to estimate. The main challenges in deriving the convergence rate of the MLE mainly come from two issues: (1) The interaction between the function h0h_{0} and the density function ff; (2) The deviated proportion λ∗\lambda^{\ast} can go to the extreme points of [0,1][0,1] as the sample size tends to infinity. To address these challenges, we develop the \emph{distinguishability condition} to capture the linear independent relation between the function h0h_{0} and the density function ff. We then provide comprehensive convergence rates of the MLE via the vanishing rate of λ∗\lambda^{\ast} to zero as well as the distinguishability of two functions h0h_{0} and ff.Comment: Dat Do and Huy Nguyen contributed equally to this work; 38 pages, 20 figure

    Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts

    Full text link
    Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive deep-learning architectures without increasing the computational cost. Despite its popularity in real-world applications, the theoretical understanding of that gating function has remained an open problem. The main challenge comes from the structure of the top-K sparse softmax gating function, which partitions the input space into multiple regions with distinct behaviors. By focusing on a Gaussian mixture of experts, we establish theoretical results on the effects of the top-K sparse softmax gating function on both density and parameter estimations. Our results hinge upon defining novel loss functions among parameters to capture different behaviors of the input regions. When the true number of experts k∗k_{\ast} is known, we demonstrate that the convergence rates of density and parameter estimations are both parametric on the sample size. However, when k∗k_{\ast} becomes unknown and the true model is over-specified by a Gaussian mixture of kk experts where k>k∗k > k_{\ast}, our findings suggest that the number of experts selected from the top-K sparse softmax gating function must exceed the total cardinality of a certain number of Voronoi cells associated with the true parameters to guarantee the convergence of the density estimation. Moreover, while the density estimation rate remains parametric under this setting, the parameter estimation rates become substantially slow due to an intrinsic interaction between the softmax gating and expert functions.Comment: 35 page

    Hierarchical Sliced Wasserstein Distance

    Full text link
    Sliced Wasserstein (SW) distance has been widely used in different application scenarios since it can be scaled to a large number of supports without suffering from the curse of dimensionality. The value of sliced Wasserstein distance is the average of transportation cost between one-dimensional representations (projections) of original measures that are obtained by Radon Transform (RT). Despite its efficiency in the number of supports, estimating the sliced Wasserstein requires a relatively large number of projections in high-dimensional settings. Therefore, for applications where the number of supports is relatively small compared with the dimension, e.g., several deep learning applications where the mini-batch approaches are utilized, the complexities from matrix multiplication of Radon Transform become the main computational bottleneck. To address this issue, we propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections. We explain the usage of these projections by introducing Hierarchical Radon Transform (HRT) which is constructed by applying Radon Transform variants recursively. We then formulate the approach into a new metric between measures, named Hierarchical Sliced Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the metricity of HSW. Moreover, we investigate the theoretical properties of HSW including its connection to SW variants and its computational and sample complexities. Finally, we compare the computational cost and generative quality of HSW with the conventional SW on the task of deep generative modeling using various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet.Comment: 28 pages, 7 figures, 3 table

    Myosin-II proteins are involved in the growth, morphogenesis, and virulence of the human pathogenic fungus Mucor circinelloides

    Get PDF
    Mucormycosis is an emerging lethal invasive fungal infection. The infection caused by fungi belonging to the order Mucorales has been reported recently as one of the most common fungal infections among COVID-19 patients. The lack of understanding of pathogens, particularly at the molecular level, is one of the reasons for the difficulties in the management of the infection. Myosin is a diverse superfamily of actin-based motor proteins that have various cellular roles. Four families of myosin motors have been found in filamentous fungi, including myosin I, II, V, and fungus-specific chitin synthase with myosin motor domains. Our previous study on Mucor circinelloides, a common pathogen of mucormycosis, showed that the Myo5 protein (ID 51513) belonging to the myosin type V family had a critical impact on the growth and virulence of this fungus. In this study, to investigate the roles of myosin II proteins in M. circinelloides, silencing phenotypes and null mutants corresponding to myosin II encoding genes, designated mcmyo2A (ID 149958) and mcmyo2B (ID 136314), respectively, were generated. Those mutant strains featured a significantly reduced growth rate and impaired sporulation in comparison with the wild-type strain. Notably, the disruption of mcmyo2A led to an almost complete lack of sporulation. Both mutant strains displayed abnormally short, septate, and inflated hyphae with the presence of yeast-like cells and an unusual accumulation of pigment-filled vesicles. In vivo virulence assays of myosin-II mutant strains performed in the invertebrate model Galleria mellonella indicated that the mcmyo2A-knockout strain was avirulent, while the pathogenesis of the mcmyo2B null mutant was unaltered despite the low growth rate and impaired sporulation. The findings provide suggestions for critical contributions of the myosin II proteins to the polarity growth, septation, morphology, pigment transportation, and pathogenesis of M. circinelloides. The findings also implicate the myosin family as a potential target for future therapy to treat mucormycosis

    Pre-treatment potential of electro-coagulation process using aluminum and titanium electrodes for instant coffee processing wastewater

    Get PDF
    This study aimed at investigating the potential of electrocoagulation (EC) process using Al-Al and Al-Ti electrodes for the pre-treatment of instant coffee processing wastewater. Effects of various operating conditions, including cell voltage, time of treatment, inter-electrode distance, pH of solution, solution conductivity and agitation speed on the removals of chemical oxygen demand (COD) and color were considered. The maximum removal of COD and color was achieved at 87% and 99%, respectively, corresponding to COD and color in the effluents of 359-384 mg/L and 58-101 Pt-Co. Biodegradability of treated wastewater was significantly improved since BOD5/COD increased from initial value of 0.42 to 0.65 after treatment. Nether mixing nor adding of electrolyte was recommended. Moreover, the COD removal kinetics during EC process appeared to follow the first-order kinetic model. The operating costs were also determined as a reference for cost assessment of the treatment

    Ultrasonic-Assisted Cathodic Plasma Electrolysis Approach for Producing of Graphene Nanosheets

    Get PDF
    In this chapter, we review on the production of graphene by ultrasonic-assisted cathodic plasma electrolysis approach which involves a combination process of conventional electrolysis and plasma at ambient pressure and moderate temperature. Firstly, we review on the techniques for electrochemical preparation of graphene. Then, we briefly describe plasma electrolysis approach for producing of graphene. The mechanism, advantages, and disadvantages of this technique are discussed in detail

    CHARACTERIZATION AND ADSORPTION CAPACITY OF AMINE-SIO2 MATERIAL FOR NITRATE AND PHOSPHATE REMOVAL

    Get PDF
    Amine-SiO2 material was synthesized and applied as a novel adsorbent for nitrate and phosphate removal from aqueous solution. The characterization of Amine-SiO2 were done by using TGA, FTIR, BET, and SEM analyses. Results showed that Amine-SiO2 had higher nitrate and phosphate adsorption capacity of 1.14 and 4.16 times, respectively, than commercial anion exchange resin (Akualite A420). In addition, Amine-SiO2 also had good durability with stable performance after at least 10 regeneration times, indicating that this material is very promising for commercialization in the future as an adsorbent for water treatment

    Application of the cut-off projection to solve a backward heat conduction problem in a two-slab composite system

    Get PDF
    The main goal of this paper is applying the cut-off projection for solving one-dimensional backward heat conduction problem in a two-slab system with a perfect contact. In a constructive manner, we commence by demonstrating the Fourier-based solution that contains the drastic growth due to the high-frequency nature of the Fourier series. Such instability leads to the need of studying the projection method where the cut-off approach is derived consistently. In the theoretical framework, the first two objectives are to construct the regularized problem and prove its stability for each noise level. Our second interest is estimating the error in -norm. Another supplementary objective is computing the eigen-elements. All in all, this paper can be considered as a preliminary attempt to solve the heating/cooling of a two-slab composite system backward in time. Several numerical tests are provided to corroborate the qualitative analysis.Peer reviewe
    • …
    corecore