33 research outputs found
Methods of Training Task Decompositions in Gated Modular Neural Networks
Mixture of experts (MoE), introduced over 20 years ago, is the simplest gated modular neural network architecture. The gate in the MoE architecture learns task decompositions and individual experts (modules) learn simpler functions appropriate to the gate’s task decomposition. This could inherently make MoE interpretable as errors can be attributed either to gating or to individual experts thereby providing either a gate or expert level diagnosis. Due to the specialization of experts they could modularly be transfered to other tasks. However, our initial experiments showed that the original MoE architecture and its end-to-end expert and gate training method does not guarantee intuitive task decompositions and expert utilization, indeed it can fail spectacularly even for simple data such as MNIST. This thesis therefore explores task decompositions among experts by the gate in existing MoE architectures and training methods and demonstrates how they can fail for even simple datasets without additional regularizations. We then propose five novel MoE training algorithms and MoE architectures: (1) Dual temperature gate and expert training that uses a softer gate distribution for training experts and a harder gate distribution to train the gate; (2) Two no- gate expert training algorithms where the experts are trained without a gate: (a) loudest expert method which selects the expert with the lowest estimate of its own loss for the sample both during training and inference; and (b) peeking expert algorithm that selects and trains the expert with the best prediction probability for the target class of a sample during training. A gate is then reverse distilled from the pre-trained experts for conditional computation during inference; (3) Attentive gating MoE architecture that computes the gate probabilities by attending to the expert outputs with additional attention weights during training. We then distill the trained attentive gate model to a simpler original MoE model for conditional computation during inference; and (4) Expert loss gating MoE architecture where the gate output is not the expert distribution but the expert log loss.We also propose a novel flexible data driven soft constraint, Ls, that uses similarity between samples to regulate the gate’s expert distribution. We empirically validate our methods on MNIST, FashionMNIST and CIFAR-10 datasets. The empirical results show that our novel training and regularization algorithms outperform benchmark MoE training methods
Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar
Automatic machine learning is an important problem in the forefront of
machine learning. The strongest AutoML systems are based on neural networks,
evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached
state-of-the-art results with an order of magnitude speedup using reinforcement
learning with self-play. In this work we extend AlphaD3M by using a pipeline
grammar and a pre-trained model which generalizes from many different datasets
and similar tasks. Our results demonstrate improved performance compared with
our earlier work and existing methods on AutoML benchmark datasets for
classification and regression tasks. In the spirit of reproducible research we
make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin
Bayesian Optimal Active Search and Surveying
We consider two active binary-classification problems with atypical
objectives. In the first, active search, our goal is to actively uncover as
many members of a given class as possible. In the second, active surveying, our
goal is to actively query points to ultimately predict the proportion of a
given class. Numerous real-world problems can be framed in these terms, and in
either case typical model-based concerns such as generalization error are only
of secondary importance.
We approach these problems via Bayesian decision theory; after choosing
natural utility functions, we derive the optimal policies. We provide three
contributions. In addition to introducing the active surveying problem, we
extend previous work on active search in two ways. First, we prove a novel
theoretical result, that less-myopic approximations to the optimal policy can
outperform more-myopic approximations by any arbitrary degree. We then derive
bounds that for certain models allow us to reduce (in practice dramatically)
the exponential search space required by a naive implementation of the optimal
policy, enabling further lookahead while still ensuring that optimal decisions
are always made.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
AlphaD3M: Machine Learning Pipeline Synthesis
peer reviewedWe introduce AlphaD3M, an automatic machine learning (AutoML) system based on
meta reinforcement learning using sequence models with self play. AlphaD3M is
based on edit operations performed over machine learning pipeline primitives
providing explainability. We compare AlphaD3M with state-of-the-art AutoML
systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M
achieves competitive performance while being an order of magnitude faster,
reducing computation time from hours to minutes, and is explainable by design
The Design and Performance of a CORBA Audio/Video Streaming Service
Factory patterns [Gamma et al., 1995], as described in Section 2.3.1. Flexibility in data transfer protocol: A CORBA A/V Streaming Service implementation may need to select from a variety of transfer protocols. For instance, an Internet-based streaming application, such as Realvideo [RealNetworks, 1998], may use the UDP protocol, whereas a local intranet video-conferencing tool [et al., 1996] might prefer the QoS features offered by native high-speed ATM protocols. Likewise, RTP [Schulzrinne et al., 1994] is gaining acceptance as a transfer protocol for streaming audio and video data over the Internet. Thus, it is essential that a A/V Streaming Service support a range of data transfer protocols dynamically. The CORBA A/V Streaming Service defines a simple specialized protocol Simple Flow Protocol (SFP), which makes no assumptions about the communication protocols used for data streaming and provides an architecture independent flow content transfer. Consequently, the stream establis..