Search CORE

33 research outputs found

Methods of Training Task Decompositions in Gated Modular Neural Networks

Author: Krishnamurthy Yamuna
Publication venue
Publication date: 01/01/2024
Field of study

Mixture of experts (MoE), introduced over 20 years ago, is the simplest gated modular neural network architecture. The gate in the MoE architecture learns task decompositions and individual experts (modules) learn simpler functions appropriate to the gate’s task decomposition. This could inherently make MoE interpretable as errors can be attributed either to gating or to individual experts thereby providing either a gate or expert level diagnosis. Due to the specialization of experts they could modularly be transfered to other tasks. However, our initial experiments showed that the original MoE architecture and its end-to-end expert and gate training method does not guarantee intuitive task decompositions and expert utilization, indeed it can fail spectacularly even for simple data such as MNIST. This thesis therefore explores task decompositions among experts by the gate in existing MoE architectures and training methods and demonstrates how they can fail for even simple datasets without additional regularizations. We then propose five novel MoE training algorithms and MoE architectures: (1) Dual temperature gate and expert training that uses a softer gate distribution for training experts and a harder gate distribution to train the gate; (2) Two no- gate expert training algorithms where the experts are trained without a gate: (a) loudest expert method which selects the expert with the lowest estimate of its own loss for the sample both during training and inference; and (b) peeking expert algorithm that selects and trains the expert with the best prediction probability for the target class of a sample during training. A gate is then reverse distilled from the pre-trained experts for conditional computation during inference; (3) Attentive gating MoE architecture that computes the gate probabilities by attending to the expert outputs with additional attention weights during training. We then distill the trained attentive gate model to a simpler original MoE model for conditional computation during inference; and (4) Expert loss gating MoE architecture where the gate output is not the expert distribution but the expert log loss.We also propose a novel flexible data driven soft constraint, Ls, that uses similarity between samples to regulate the gate’s expert distribution. We empirically validate our methods on MNIST, FashionMNIST and CIFAR-10 datasets. The empirical results show that our novel training and regularization algorithms outperform benchmark MoE training methods

Royal Holloway - Pure

Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

Author: Cho Kyunghyun
Drori Iddo
Freire Juliana
Krishnamurthy Yamuna
Lourenco Raoni
Rampin Remi
Silva Claudio
Publication venue
Publication date: 01/01/2019
Field of study

Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

Bayesian Optimal Active Search and Surveying

Author: Garnett Roman
Krishnamurthy Yamuna
Mann Richard
Schneider Jeff
Xiong Xuehan
Publication venue
Publication date: 01/01/2012
Field of study

We consider two active binary-classification problems with atypical objectives. In the first, active search, our goal is to actively uncover as many members of a given class as possible. In the second, active surveying, our goal is to actively query points to ultimately predict the proportion of a given class. Numerous real-world problems can be framed in these terms, and in either case typical model-based concerns such as generalization error are only of secondary importance. We approach these problems via Bayesian decision theory; after choosing natural utility functions, we derive the optimal policies. We provide three contributions. In addition to introducing the active surveying problem, we extend previous work on active search in two ways. First, we prove a novel theoretical result, that less-myopic approximations to the optimal policy can outperform more-myopic approximations by any arbitrary degree. We then derive bounds that for certain models allow us to reduce (in practice dramatically) the exponential search space required by a naive implementation of the optimal policy, enabling further lookahead while still ensuring that optimal decisions are always made.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012

arXiv.org e-Print Archive

CiteSeerX

White Rose Research Online

Supporting Complaints Investigation for Nursing and Midwifery Regulatory Agencies

Author: Gao Yang
Jago Robert
Krishnamurthy Yamuna
Lertvittayakumjorn Piyawat
Petej Ivan
Stathis Kostas
van der Gaag Anna
Publication venue
Publication date: 06/08/2021
Field of study

Royal Holloway - Pure

Use of Artificial Intelligence in Regulatory Decision-Making

Author: Austin Zubin
Caceres Silva Juan
Gallagher Ann
Gao Yang
Jago Robert
Krishnamurthy Yamuna
Lertvittayakumjorn Piyawat
Petej Ivan
Stathis Kostas
van der Gaag Anna
Webster Michelle
Publication venue: 'Elsevier BV'
Publication date: 01/10/2021
Field of study

Royal Holloway - Pure

AlphaD3M: Machine Learning Pipeline Synthesis

Author: Cho Kyunghyun
DE PAULA LOURENCO Raoni
Drori Iddo
Freire Juliana
Krishnamurthy Yamuna
Piazentin Ono Jorge
Rampin Remi
Silva Claudio
Publication venue
Publication date: 01/01/2021
Field of study

peer reviewedWe introduce AlphaD3M, an automatic machine learning (AutoML) system based on meta reinforcement learning using sequence models with self play. AlphaD3M is based on edit operations performed over machine learning pipeline primitives providing explainability. We compare AlphaD3M with state-of-the-art AutoML systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M achieves competitive performance while being an order of magnitude faster, reducing computation time from hours to minutes, and is explainable by design

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

The Design and Performance of a CORBA Audio/Video Streaming Service

Author: Douglas C. Schmidt
Nagarajan Surendran
Sumedh Mungee
Yamuna Krishnamurthy
Publication venue
Publication date
Field of study

Factory patterns [Gamma et al., 1995], as described in Section 2.3.1. Flexibility in data transfer protocol: A CORBA A/V Streaming Service implementation may need to select from a variety of transfer protocols. For instance, an Internet-based streaming application, such as Realvideo [RealNetworks, 1998], may use the UDP protocol, whereas a local intranet video-conferencing tool [et al., 1996] might prefer the QoS features offered by native high-speed ATM protocols. Likewise, RTP [Schulzrinne et al., 1994] is gaining acceptance as a transfer protocol for streaming audio and video data over the Internet. Thus, it is essential that a A/V Streaming Service support a range of data transfer protocols dynamically. The CORBA A/V Streaming Service defines a simple specialized protocol Simple Flow Protocol (SFP), which makes no assumptions about the communication protocols used for data streaming and provides an architecture independent flow content transfer. Consequently, the stream establis..

CiteSeerX