47 research outputs found
Reinforcement Learning using Augmented Neural Networks
Neural networks allow Q-learning reinforcement learning agents such as deep
Q-networks (DQN) to approximate complex mappings from state spaces to value
functions. However, this also brings drawbacks when compared to other function
approximators such as tile coding or their generalisations, radial basis
functions (RBF) because they introduce instability due to the side effect of
globalised updates present in neural networks. This instability does not even
vanish in neural networks that do not have any hidden layers. In this paper, we
show that simple modifications to the structure of the neural network can
improve stability of DQN learning when a multi-layer perceptron is used for
function approximation.Comment: 7 pages; two columns; 4 figure
On the Convergence of Techniques that Improve Value Iteration
Prioritisation of Bellman backups or updating only a small subset of actions represent important techniques for speeding up planning in MDPs. The recent literature showed new efficient approaches which exploit these directions. Backward value iteration and backing up only the best actions were shown to lead to a significant reduction of the planning time. This paper conducts a theoretical and empirical analysis of these techniques and shows new important proofs. In particular, (1) it identifies weaker requirements for the convergence of backups based on best actions only, (2) a new method for evaluation of the Bellman error is shown for the update that updates one best action once, (3) it presents the theoretical proof of backward value iteration and establishes required initialisation, (4) and shows that the default state ordering of backups in standard value iteration can significantly influence its performance. Additionally, (5) the existing literature did not compare these methods, either empirically or analytically, against policy iteration. The rigorous empirical and novel theoretical parts of the paper reveal important associations and allow drawing guidelines on which type of value or policy iteration is suitable for a given domain. Finally, our chief message is that standard value iteration can be made far more efficient by simple modifications shown in the paper
Improving Language Modelling with Noise-contrastive estimation
Neural language models do not scale well when the vocabulary is large.
Noise-contrastive estimation (NCE) is a sampling-based method that allows for
fast learning with large vocabularies. Although NCE has shown promising
performance in neural machine translation, it was considered to be an
unsuccessful approach for language modelling. A sufficient investigation of the
hyperparameters in the NCE-based neural language models was also missing. In
this paper, we showed that NCE can be a successful approach in neural language
modelling when the hyperparameters of a neural network are tuned appropriately.
We introduced the 'search-then-converge' learning rate schedule for NCE and
designed a heuristic that specifies how to use this schedule. The impact of the
other important hyperparameters, such as the dropout rate and the weight
initialisation range, was also demonstrated. We showed that appropriate tuning
of NCE-based neural language models outperforms the state-of-the-art
single-model methods on a popular benchmark
Reward Shaping in Episodic Reinforcement Learning
Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming because the learning algorithms have to determine the long term consequences of their actions using delayed feedback or rewards. Reward shaping is a method of incorporating domain knowledge into reinforcement learning so that the algorithms are guided faster towards more promising solutions. Under an overarching theme of episodic reinforcement learning, this paper shows a unifying analysis of potential-based reward shaping which leads to new theoretical insights into reward shaping in both model-free and model-based algorithms, as well as in multi-agent reinforcement learning
Isomorph-Free Branch and Bound Search for Finite State Controllers
The recent proliferation of smart-phones and other wearable devices has lead
to a surge of new mobile applications. Partially observable Markov decision
processes provide a natural framework to design applications that
continuously make decisions based on noisy sensor measurements. However,
given the limited battery life, there is a need to minimize the amount of
online computation. This can be achieved by compiling a policy into a
finite state controller since there is no need for belief monitoring or
online search. In this paper, we propose a new branch and bound technique
to search for a good controller. In contrast to many existing algorithms
for controllers, our search technique is not subject to local optima. We
also show how to reduce the amount of search by avoiding the enumeration of
isomorphic controllers and by taking advantage of suitable upper and lower
bounds. The approach is demonstrated on several benchmark problems as well
as a smart-phone application to assist persons with Alzheimer's to wayfind
SESAM at SemEval-2020 Task 8: Investigating the relationship between image and text in sentiment analysis of memes
This paper presents our submission to task 8 (memotion analysis) of the SemEval 2020 competition. We explain the algorithms that were used to learn our models along with the process of tuning the algorithms and selecting the best model. Since meme analysis is a challenging task withtwo distinct modalities, we studied the impact of different multimodal representation strategies. The results of several approaches to dealing with multimodal data are therefore discussed in the paper. We found that alignment-based strategies did not perform well on memes. Our quantitative results also showed that images and text were uncorrelated. Fusion-based strategies did not show significant improvements and using one modality only (text or image) tends to lead to better results when applied with the predictive models that we used in our research
Be More Active! Understanding the Differences between Mean and Sampled Representations of Variational Autoencoders
The ability of Variational Autoencoders to learn disentangled representations
has made them appealing for practical applications. However, their mean
representations, which are generally used for downstream tasks, have recently
been shown to be more correlated than their sampled counterpart, on which
disentanglement is usually measured. In this paper, we refine this observation
through the lens of selective posterior collapse, which states that only a
subset of the learned representations, the active variables, is encoding useful
information while the rest (the passive variables) is discarded. We first
extend the existing definition to multiple data examples and show that active
variables are equally disentangled in mean and sampled representations. Based
on this extension and the pre-trained models from disentanglement lib, we then
isolate the passive variables and show that they are responsible for the
discrepancies between mean and sampled representations. Specifically, passive
variables exhibit high correlation scores with other variables in mean
representations while being fully uncorrelated in sampled ones. We thus
conclude that despite what their higher correlation might suggest, mean
representations are still good candidates for downstream tasks applications.
However, it may be beneficial to remove their passive variables, especially
when used with models sensitive to correlated features.Comment: the main paper of 20 pages plus an appendix; 29 pages in tota
How good are variational autoencoders at transfer learning?
Variational autoencoders (VAEs) are used for transfer learning across various
research domains such as music generation or medical image analysis. However,
there is no principled way to assess before transfer which components to
retrain or whether transfer learning is likely to help on a target task. We
propose to explore this question through the lens of representational
similarity. Specifically, using Centred Kernel Alignment (CKA) to evaluate the
similarity of VAEs trained on different datasets, we show that encoders'
representations are generic but decoders' specific. Based on these insights, we
discuss the implications for selecting which components of a VAE to retrain and
propose a method to visually assess whether transfer learning is likely to help
on classification tasks.Comment: Extended version of arXiv:2205.08399 with further experiments on
transfer learnin
A Spectral Method that Worked Well in the SPiCe'16 Competition
We present methods used in our submission to the Sequence Prediction ChallengE (SPiCe’16) 1 .
The two methods used to solve the competition tasks were spectral learning and a count
based method. Spectral learning led to better results on most of the problems
An Improved Crowdsourcing Based Evaluation Technique for Word Embedding Methods
In this proposal track paper, we have presented a crowdsourcing-based word embedding evaluation technique that will be more reliable and linguistically justified. The method is designed for intrinsic evaluation and extends the approach proposed in (Schnabel et al., 2015). Our improved evaluation technique captures word relatedness based on the word context