63 research outputs found
Learning Learning Algorithms
Machine learning models rely on data to learn any given task and depending on the universal diversity of each of the elements of the task and the design objectives, multiple data may be required
for better performance, which in turn could exponentially increase learning time and computational cost. Although most of the training of machine learning models today are done using GPUs
(Graphics Processing Unit) to speed up the training process, most however, depending on the
dataset, still require a huge amount of training time to attain good performance.
This study aims to look into learning learning algorithms or popularly known as metalearning
which is a method that not only tries to improve the learning speed but also the model performance and in addition it requires fewer data and entails multiple tasks. The concept involves
training a model that constantly learns to learn novel tasks at a fast rate from previously learned
tasks.
For the review of the related work, attention will be given to optimization-based methods and most
precisely MAML (Model Agnostic MetaLearning), because first of all, it is one of the most popular
state-of-the-art metalearning method, and second of all, this thesis focuses on creating a MAML
based method called MAML-DBL that uses an adaptive learning rate technique with dynamic
bounds that enables it to attain quick convergence at the beginning of the training process and
good generalization towards the end.
The proposed MAML variant aims to try to prevent vanishing learning rates during training and
slowing down at the end where dense features are prevalent, although further hyperparameter tunning might be necessary for some models or where sparse features may be prevalent, for improved
performance.
MAML-DBL and MAML, were tested on the most commonly used datasets for metalearning models, and based on the results of the experiments, the proposed method showed a rather competitive
performance on some of the models and even outperformed the baseline in some of the carried out
tests.
The results obtained with both MAML-DBL (in one of the dataset) and MAML, show that metalearning methods are highly recommendable solutions whenever good performance, less data and
a multi-task or versatile model are required or desired.Os modelos de aprendizagem automática dependem dos dados para aprender qualquer tarefa e,
dependendo da diversidade de cada um dos elementos da tarefa e dos objetivos do projeto, a quantidade de dados pode ser elevada, o que, por sua vez, pode aumentar exponencialmente o tempo de
aprendizagem e o custo computacional. Embora a maioria do treino dos modelos de aprendizagem
automática hoje seja feito usando GPUs (unidade de processamento gráfico), ainda é necessária
uma quantidade enorme de tempo de treino para obter o desempenho desejado.
Este trabalho tem como objetivo analisar os algoritmos de aprendizagem de aprendizagem ou popularmente conhecidos como metalearning, que são métodos que não apenas tentam melhorar a
velocidade de aprendizagem, mas também o desempenho do modelo e, além disso, requerem menos
dados e envolvem várias tarefas. O conceito envolve o treino de um modelo que aprende constantemente a aprender tarefas novas em ritmo acelerado, a partir de tarefas aprendidas anteriormente.
Para a revisão do trabalho relacionado, será dada atenção aos métodos baseados em otimização
e, mais precisamente, ao MAML (Model Agnostic MetaLearning), porque em primeiro lugar é um
dos métodos de metalearning mais populares e em segundo lugar, esta tese foca a criação de um
método baseado em MAML, chamado MAML-DBL, que usa uma técnica de taxa de aprendizagem
adaptável com limites dinâmicos que permite obter convergência rápida no início do processo de
treino e boa generalização no fim.
A proposta variante de MAML tem como objetivo tentar evitar o desaparecimento das taxas de
aprendizagem durante o treino e a desaceleração no fim onde entradas densas são predominantes,
embora possa ser necessário um ajuste adicional dos hiperparâmetros para alguns modelos ou onde
entradas esparsas podem ser predominantes, para melhorar o desempenho.
O MAML-DBL e o MAML foram testados nos conjuntos de dados mais comumente usados para
modelos de metalearning, e com base nos resultados das experiências, o método proposto mostrou
um desempenho bastante competitivo em alguns dos modelos e até superou o baseline em alguns
dos testes realizados.
Os resultados obtidos com o MAML e MAML-DBL (num dos conjuntos de dados) mostram que os
métodos de metalearning são soluções altamente recomendáveis sempre que um bom desempenho,
menos dados e um modelo versátil ou com várias tarefas são necessários ou desejados
Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training
Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking
Multi-Task Meta Learning: learn how to adapt to unseen tasks
This work proposes Multi-task Meta Learning (MTML), integrating two learning
paradigms Multi-Task Learning (MTL) and meta learning, to bring together the
best of both worlds. In particular, it focuses simultaneous learning of
multiple tasks, an element of MTL and promptly adapting to new tasks, a quality
of meta learning. It is important to highlight that we focus on heterogeneous
tasks, which are of distinct kind, in contrast to typically considered
homogeneous tasks (e.g., if all tasks are classification or if all tasks are
regression tasks). The fundamental idea is to train a multi-task model, such
that when an unseen task is introduced, it can learn in fewer steps whilst
offering a performance at least as good as conventional single task learning on
the new task or inclusion within the MTL. By conducting various experiments, we
demonstrate this paradigm on two datasets and four tasks: NYU-v2 and the
taskonomy dataset for which we perform semantic segmentation, depth estimation,
surface normal estimation, and edge detection. MTML achieves state-of-the-art
results for three out of four tasks for the NYU-v2 dataset and two out of four
for the taskonomy dataset. In the taskonomy dataset, it was discovered that
many pseudo-labeled segmentation masks lacked classes that were expected to be
present in the ground truth; however, our MTML approach was found to be
effective in detecting these missing classes, delivering good qualitative
results. While, quantitatively its performance was affected due to the presence
of incorrect ground truth labels. The the source code for reproducibility can
be found at https://github.com/ricupa/MTML-learn-how-to-adapt-to-unseen-tasks
Adaptive reinforcement learning with active state-specific exploration for engagement maximization during simulated child-robot interaction
International audienceUsing assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact. Among relevant signals, non-verbal cues such as the child's gaze can provide the robot with important information about the child's current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parame-terized actions enable the agent to learn single moves until the end of a corridor, similarly to "options" but without explicit hierarchical representations. We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children
Learning to Identify Critical States for Reinforcement Learning from Videos
Recent work on deep reinforcement learning (DRL) has pointed out that
algorithmic information about good policies can be extracted from offline data
which lack explicit information about executed actions. For example, videos of
humans or robots may convey a lot of implicit information about rewarding
action sequences, but a DRL machine that wants to profit from watching such
videos must first learn by itself to identify and recognize relevant
states/actions/rewards. Without relying on ground-truth annotations, our new
method called Deep State Identifier learns to predict returns from episodes
encoded as videos. Then it uses a kind of mask-based sensitivity analysis to
extract/identify important critical states. Extensive experiments showcase our
method's potential for understanding and improving agent behavior. The source
code and the generated datasets are available at
https://github.com/AI-Initiative-KAUST/VideoRLCS.Comment: This paper was accepted to ICCV2
- …