11 research outputs found
Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models
Mixture of experts (MoE) models are widely applied for conditional
probability density estimation problems. We demonstrate the richness of the
class of MoE models by proving denseness results in Lebesgue spaces, when
inputs and outputs variables are both compactly supported. We further prove an
almost uniform convergence result when the input is univariate. Auxiliary
lemmas are proved regarding the richness of the soft-max gating function class,
and their relationships to the class of Gaussian gating functions.Comment: Corrected typos. Added new Section 6. Summary and conclusion
Approximation of conditional densities by smooth mixtures of regressions
This paper shows that large nonparametric classes of conditional multivariate
densities can be approximated in the Kullback--Leibler distance by different
specifications of finite mixtures of normal regressions in which normal means
and variances and mixing probabilities can depend on variables in the
conditioning set (covariates). These models are a special case of models known
as "mixtures of experts" in statistics and computer science literature.
Flexible specifications include models in which only mixing probabilities,
modeled by multinomial logit, depend on the covariates and, in the univariate
case, models in which only means of the mixed normals depend flexibly on the
covariates. Modeling the variance of the mixed normals by flexible functions of
the covariates can weaken restrictions on the class of the approximable
densities. Obtained results can be generalized to mixtures of general location
scale densities. Rates of convergence and easy to interpret bounds are also
obtained for different model specifications. These approximation results can be
useful for proving consistency of Bayesian and maximum likelihood density
estimators based on these models. The results also have interesting
implications for applied researchers.Comment: Published in at http://dx.doi.org/10.1214/09-AOS765 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Extending Mixture of Experts Model to Investigate Heterogeneity of Trajectories: When, Where and How to Add Which Covariates
Researchers are usually interested in examining the impact of covariates when
separating heterogeneous samples into latent classes that are more homogeneous.
The majority of theoretical and empirical studies with such aims have focused
on identifying covariates as predictors of class membership in the structural
equation modeling framework. In other words, the covariates only indirectly
affect the sample heterogeneity. However, the covariates' influence on
between-individual differences can also be direct. This article presents a
mixture model that investigates covariates to explain within-cluster and
between-cluster heterogeneity simultaneously, known as a mixture-of-experts
(MoE) model. This study aims to extend the MoE framework to investigate
heterogeneity in nonlinear trajectories: to identify latent classes, covariates
as predictors to clusters, and covariates that explain within-cluster
differences in change patterns over time. Our simulation studies demonstrate
that the proposed model generally estimates the parameters unbiasedly,
precisely and exhibits appropriate empirical coverage for a nominal 95%
confidence interval. This study also proposes implementing structural equation
model forests to shrink the covariate space of the proposed mixture model. We
illustrate how to select covariates and construct the proposed model with
longitudinal mathematics achievement data. Additionally, we demonstrate that
the proposed mixture model can be further extended in the structural equation
modeling framework by allowing the covariates that have direct effects to be
time-varying.Comment: Draft version 1.7, 06/01/2021. This paper has not been peer reviewed.
Please do not copy or cite without author's permissio
Error bounds for functional approximation and estimation using mixtures of experts
We examine some mathematical aspects of learning unknown mappings with the Mixture of Experts Model (MEM). Specifically, we observe that the MEM is at least as powerful as a class of neural networks, in a sense that will be made precise. Upper bounds on the approximation error are established for a wide class of target functions. The general theorem states that inf kf; f nk p c=n r=d holds uniformly for f 2 W r(L) (a Sobolev class over [;1 � 1] p d), where fn belongs to an n-dimensional manifold of normalized ridge functions. The same bound holds for the MEM as a special case of the above. The stochastic error, in the context of learning from i.i.d. examples, is also examined. An asymptotic analysis establishes the limiting behavior of this error, in terms of certain pseudo-information matrices. These results substantiate the intuition behind the MEM, and motivate applications
Error bounds for functional approximation and estimation using mixtures of experts
We examine some mathematical aspects of learning unknown mappings with the Mixture of Experts Model (MEM). Speci cally, we observe that the MEM is at least as powerful as a class of neural networks, in a sense that will be made precise. Upper bounds on the approximation error are established for a wide class of target functions. The general theorem states that kf; f nk p c=n r=d for f 2 W r p (L) (a Sobolev class over [;1 � 1] d), and f n belongstoann-dimensional manifold of normalized ridge functions. The same bound holds for the MEM as a special case of the above. The stochastic error, in the context of learning from i.i.d. examples, is also examined. An asymptotic analysis establishes the limiting behavior of this error, in terms of certain pseudo-information matrices. These results substantiate the intuition behind the MEM, and motivate applications