33 research outputs found

    Methods of Training Task Decompositions in Gated Modular Neural Networks

    Get PDF
    Mixture of experts (MoE), introduced over 20 years ago, is the simplest gated modular neural network architecture. The gate in the MoE architecture learns task decompositions and individual experts (modules) learn simpler functions appropriate to the gate’s task decomposition. This could inherently make MoE interpretable as errors can be attributed either to gating or to individual experts thereby providing either a gate or expert level diagnosis. Due to the specialization of experts they could modularly be transfered to other tasks. However, our initial experiments showed that the original MoE architecture and its end-to-end expert and gate training method does not guarantee intuitive task decompositions and expert utilization, indeed it can fail spectacularly even for simple data such as MNIST. This thesis therefore explores task decompositions among experts by the gate in existing MoE architectures and training methods and demonstrates how they can fail for even simple datasets without additional regularizations. We then propose five novel MoE training algorithms and MoE architectures: (1) Dual temperature gate and expert training that uses a softer gate distribution for training experts and a harder gate distribution to train the gate; (2) Two no- gate expert training algorithms where the experts are trained without a gate: (a) loudest expert method which selects the expert with the lowest estimate of its own loss for the sample both during training and inference; and (b) peeking expert algorithm that selects and trains the expert with the best prediction probability for the target class of a sample during training. A gate is then reverse distilled from the pre-trained experts for conditional computation during inference; (3) Attentive gating MoE architecture that computes the gate probabilities by attending to the expert outputs with additional attention weights during training. We then distill the trained attentive gate model to a simpler original MoE model for conditional computation during inference; and (4) Expert loss gating MoE architecture where the gate output is not the expert distribution but the expert log loss.We also propose a novel flexible data driven soft constraint, Ls, that uses similarity between samples to regulate the gate’s expert distribution. We empirically validate our methods on MNIST, FashionMNIST and CIFAR-10 datasets. The empirical results show that our novel training and regularization algorithms outperform benchmark MoE training methods

    Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

    Get PDF
    Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin

    Bayesian Optimal Active Search and Surveying

    Get PDF
    We consider two active binary-classification problems with atypical objectives. In the first, active search, our goal is to actively uncover as many members of a given class as possible. In the second, active surveying, our goal is to actively query points to ultimately predict the proportion of a given class. Numerous real-world problems can be framed in these terms, and in either case typical model-based concerns such as generalization error are only of secondary importance. We approach these problems via Bayesian decision theory; after choosing natural utility functions, we derive the optimal policies. We provide three contributions. In addition to introducing the active surveying problem, we extend previous work on active search in two ways. First, we prove a novel theoretical result, that less-myopic approximations to the optimal policy can outperform more-myopic approximations by any arbitrary degree. We then derive bounds that for certain models allow us to reduce (in practice dramatically) the exponential search space required by a naive implementation of the optimal policy, enabling further lookahead while still ensuring that optimal decisions are always made.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

    AlphaD3M: Machine Learning Pipeline Synthesis

    Get PDF
    peer reviewedWe introduce AlphaD3M, an automatic machine learning (AutoML) system based on meta reinforcement learning using sequence models with self play. AlphaD3M is based on edit operations performed over machine learning pipeline primitives providing explainability. We compare AlphaD3M with state-of-the-art AutoML systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M achieves competitive performance while being an order of magnitude faster, reducing computation time from hours to minutes, and is explainable by design

    The Design and Performance of a CORBA Audio/Video Streaming Service

    No full text
    Factory patterns [Gamma et al., 1995], as described in Section 2.3.1. Flexibility in data transfer protocol: A CORBA A/V Streaming Service implementation may need to select from a variety of transfer protocols. For instance, an Internet-based streaming application, such as Realvideo [RealNetworks, 1998], may use the UDP protocol, whereas a local intranet video-conferencing tool [et al., 1996] might prefer the QoS features offered by native high-speed ATM protocols. Likewise, RTP [Schulzrinne et al., 1994] is gaining acceptance as a transfer protocol for streaming audio and video data over the Internet. Thus, it is essential that a A/V Streaming Service support a range of data transfer protocols dynamically. The CORBA A/V Streaming Service defines a simple specialized protocol Simple Flow Protocol (SFP), which makes no assumptions about the communication protocols used for data streaming and provides an architecture independent flow content transfer. Consequently, the stream establis..
    corecore