47 research outputs found

    Reinforcement Learning using Augmented Neural Networks

    Full text link
    Neural networks allow Q-learning reinforcement learning agents such as deep Q-networks (DQN) to approximate complex mappings from state spaces to value functions. However, this also brings drawbacks when compared to other function approximators such as tile coding or their generalisations, radial basis functions (RBF) because they introduce instability due to the side effect of globalised updates present in neural networks. This instability does not even vanish in neural networks that do not have any hidden layers. In this paper, we show that simple modifications to the structure of the neural network can improve stability of DQN learning when a multi-layer perceptron is used for function approximation.Comment: 7 pages; two columns; 4 figure

    On the Convergence of Techniques that Improve Value Iteration

    Get PDF
    Prioritisation of Bellman backups or updating only a small subset of actions represent important techniques for speeding up planning in MDPs. The recent literature showed new efficient approaches which exploit these directions. Backward value iteration and backing up only the best actions were shown to lead to a significant reduction of the planning time. This paper conducts a theoretical and empirical analysis of these techniques and shows new important proofs. In particular, (1) it identifies weaker requirements for the convergence of backups based on best actions only, (2) a new method for evaluation of the Bellman error is shown for the update that updates one best action once, (3) it presents the theoretical proof of backward value iteration and establishes required initialisation, (4) and shows that the default state ordering of backups in standard value iteration can significantly influence its performance. Additionally, (5) the existing literature did not compare these methods, either empirically or analytically, against policy iteration. The rigorous empirical and novel theoretical parts of the paper reveal important associations and allow drawing guidelines on which type of value or policy iteration is suitable for a given domain. Finally, our chief message is that standard value iteration can be made far more efficient by simple modifications shown in the paper

    Improving Language Modelling with Noise-contrastive estimation

    Full text link
    Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, it was considered to be an unsuccessful approach for language modelling. A sufficient investigation of the hyperparameters in the NCE-based neural language models was also missing. In this paper, we showed that NCE can be a successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the 'search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. We showed that appropriate tuning of NCE-based neural language models outperforms the state-of-the-art single-model methods on a popular benchmark

    Reward Shaping in Episodic Reinforcement Learning

    Get PDF
    Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming because the learning algorithms have to determine the long term consequences of their actions using delayed feedback or rewards. Reward shaping is a method of incorporating domain knowledge into reinforcement learning so that the algorithms are guided faster towards more promising solutions. Under an overarching theme of episodic reinforcement learning, this paper shows a unifying analysis of potential-based reward shaping which leads to new theoretical insights into reward shaping in both model-free and model-based algorithms, as well as in multi-agent reinforcement learning

    Isomorph-Free Branch and Bound Search for Finite State Controllers

    Get PDF
    The recent proliferation of smart-phones and other wearable devices has lead to a surge of new mobile applications. Partially observable Markov decision processes provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. However, given the limited battery life, there is a need to minimize the amount of online computation. This can be achieved by compiling a policy into a finite state controller since there is no need for belief monitoring or online search. In this paper, we propose a new branch and bound technique to search for a good controller. In contrast to many existing algorithms for controllers, our search technique is not subject to local optima. We also show how to reduce the amount of search by avoiding the enumeration of isomorphic controllers and by taking advantage of suitable upper and lower bounds. The approach is demonstrated on several benchmark problems as well as a smart-phone application to assist persons with Alzheimer's to wayfind

    SESAM at SemEval-2020 Task 8: Investigating the relationship between image and text in sentiment analysis of memes

    Get PDF
    This paper presents our submission to task 8 (memotion analysis) of the SemEval 2020 competition. We explain the algorithms that were used to learn our models along with the process of tuning the algorithms and selecting the best model. Since meme analysis is a challenging task withtwo distinct modalities, we studied the impact of different multimodal representation strategies. The results of several approaches to dealing with multimodal data are therefore discussed in the paper. We found that alignment-based strategies did not perform well on memes. Our quantitative results also showed that images and text were uncorrelated. Fusion-based strategies did not show significant improvements and using one modality only (text or image) tends to lead to better results when applied with the predictive models that we used in our research

    Be More Active! Understanding the Differences between Mean and Sampled Representations of Variational Autoencoders

    Get PDF
    The ability of Variational Autoencoders to learn disentangled representations has made them appealing for practical applications. However, their mean representations, which are generally used for downstream tasks, have recently been shown to be more correlated than their sampled counterpart, on which disentanglement is usually measured. In this paper, we refine this observation through the lens of selective posterior collapse, which states that only a subset of the learned representations, the active variables, is encoding useful information while the rest (the passive variables) is discarded. We first extend the existing definition to multiple data examples and show that active variables are equally disentangled in mean and sampled representations. Based on this extension and the pre-trained models from disentanglement lib, we then isolate the passive variables and show that they are responsible for the discrepancies between mean and sampled representations. Specifically, passive variables exhibit high correlation scores with other variables in mean representations while being fully uncorrelated in sampled ones. We thus conclude that despite what their higher correlation might suggest, mean representations are still good candidates for downstream tasks applications. However, it may be beneficial to remove their passive variables, especially when used with models sensitive to correlated features.Comment: the main paper of 20 pages plus an appendix; 29 pages in tota

    How good are variational autoencoders at transfer learning?

    Full text link
    Variational autoencoders (VAEs) are used for transfer learning across various research domains such as music generation or medical image analysis. However, there is no principled way to assess before transfer which components to retrain or whether transfer learning is likely to help on a target task. We propose to explore this question through the lens of representational similarity. Specifically, using Centred Kernel Alignment (CKA) to evaluate the similarity of VAEs trained on different datasets, we show that encoders' representations are generic but decoders' specific. Based on these insights, we discuss the implications for selecting which components of a VAE to retrain and propose a method to visually assess whether transfer learning is likely to help on classification tasks.Comment: Extended version of arXiv:2205.08399 with further experiments on transfer learnin

    A Spectral Method that Worked Well in the SPiCe'16 Competition

    Get PDF
    We present methods used in our submission to the Sequence Prediction ChallengE (SPiCe’16) 1 . The two methods used to solve the competition tasks were spectral learning and a count based method. Spectral learning led to better results on most of the problems

    An Improved Crowdsourcing Based Evaluation Technique for Word Embedding Methods

    Get PDF
    In this proposal track paper, we have presented a crowdsourcing-based word embedding evaluation technique that will be more reliable and linguistically justified. The method is designed for intrinsic evaluation and extends the approach proposed in (Schnabel et al., 2015). Our improved evaluation technique captures word relatedness based on the word context
    corecore