63 research outputs found

    Learning Learning Algorithms

    Get PDF
    Machine learning models rely on data to learn any given task and depending on the universal diversity of each of the elements of the task and the design objectives, multiple data may be required for better performance, which in turn could exponentially increase learning time and computational cost. Although most of the training of machine learning models today are done using GPUs (Graphics Processing Unit) to speed up the training process, most however, depending on the dataset, still require a huge amount of training time to attain good performance. This study aims to look into learning learning algorithms or popularly known as metalearning which is a method that not only tries to improve the learning speed but also the model performance and in addition it requires fewer data and entails multiple tasks. The concept involves training a model that constantly learns to learn novel tasks at a fast rate from previously learned tasks. For the review of the related work, attention will be given to optimization-based methods and most precisely MAML (Model Agnostic MetaLearning), because first of all, it is one of the most popular state-of-the-art metalearning method, and second of all, this thesis focuses on creating a MAML based method called MAML-DBL that uses an adaptive learning rate technique with dynamic bounds that enables it to attain quick convergence at the beginning of the training process and good generalization towards the end. The proposed MAML variant aims to try to prevent vanishing learning rates during training and slowing down at the end where dense features are prevalent, although further hyperparameter tunning might be necessary for some models or where sparse features may be prevalent, for improved performance. MAML-DBL and MAML, were tested on the most commonly used datasets for metalearning models, and based on the results of the experiments, the proposed method showed a rather competitive performance on some of the models and even outperformed the baseline in some of the carried out tests. The results obtained with both MAML-DBL (in one of the dataset) and MAML, show that metalearning methods are highly recommendable solutions whenever good performance, less data and a multi-task or versatile model are required or desired.Os modelos de aprendizagem automática dependem dos dados para aprender qualquer tarefa e, dependendo da diversidade de cada um dos elementos da tarefa e dos objetivos do projeto, a quantidade de dados pode ser elevada, o que, por sua vez, pode aumentar exponencialmente o tempo de aprendizagem e o custo computacional. Embora a maioria do treino dos modelos de aprendizagem automática hoje seja feito usando GPUs (unidade de processamento gráfico), ainda é necessária uma quantidade enorme de tempo de treino para obter o desempenho desejado. Este trabalho tem como objetivo analisar os algoritmos de aprendizagem de aprendizagem ou popularmente conhecidos como metalearning, que são métodos que não apenas tentam melhorar a velocidade de aprendizagem, mas também o desempenho do modelo e, além disso, requerem menos dados e envolvem várias tarefas. O conceito envolve o treino de um modelo que aprende constantemente a aprender tarefas novas em ritmo acelerado, a partir de tarefas aprendidas anteriormente. Para a revisão do trabalho relacionado, será dada atenção aos métodos baseados em otimização e, mais precisamente, ao MAML (Model Agnostic MetaLearning), porque em primeiro lugar é um dos métodos de metalearning mais populares e em segundo lugar, esta tese foca a criação de um método baseado em MAML, chamado MAML-DBL, que usa uma técnica de taxa de aprendizagem adaptável com limites dinâmicos que permite obter convergência rápida no início do processo de treino e boa generalização no fim. A proposta variante de MAML tem como objetivo tentar evitar o desaparecimento das taxas de aprendizagem durante o treino e a desaceleração no fim onde entradas densas são predominantes, embora possa ser necessário um ajuste adicional dos hiperparâmetros para alguns modelos ou onde entradas esparsas podem ser predominantes, para melhorar o desempenho. O MAML-DBL e o MAML foram testados nos conjuntos de dados mais comumente usados para modelos de metalearning, e com base nos resultados das experiências, o método proposto mostrou um desempenho bastante competitivo em alguns dos modelos e até superou o baseline em alguns dos testes realizados. Os resultados obtidos com o MAML e MAML-DBL (num dos conjuntos de dados) mostram que os métodos de metalearning são soluções altamente recomendáveis sempre que um bom desempenho, menos dados e um modelo versátil ou com várias tarefas são necessários ou desejados

    Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle with Achievement Rewarding and Multistage Training

    Get PDF
    Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking

    Multi-Task Meta Learning: learn how to adapt to unseen tasks

    Full text link
    This work proposes Multi-task Meta Learning (MTML), integrating two learning paradigms Multi-Task Learning (MTL) and meta learning, to bring together the best of both worlds. In particular, it focuses simultaneous learning of multiple tasks, an element of MTL and promptly adapting to new tasks, a quality of meta learning. It is important to highlight that we focus on heterogeneous tasks, which are of distinct kind, in contrast to typically considered homogeneous tasks (e.g., if all tasks are classification or if all tasks are regression tasks). The fundamental idea is to train a multi-task model, such that when an unseen task is introduced, it can learn in fewer steps whilst offering a performance at least as good as conventional single task learning on the new task or inclusion within the MTL. By conducting various experiments, we demonstrate this paradigm on two datasets and four tasks: NYU-v2 and the taskonomy dataset for which we perform semantic segmentation, depth estimation, surface normal estimation, and edge detection. MTML achieves state-of-the-art results for three out of four tasks for the NYU-v2 dataset and two out of four for the taskonomy dataset. In the taskonomy dataset, it was discovered that many pseudo-labeled segmentation masks lacked classes that were expected to be present in the ground truth; however, our MTML approach was found to be effective in detecting these missing classes, delivering good qualitative results. While, quantitatively its performance was affected due to the presence of incorrect ground truth labels. The the source code for reproducibility can be found at https://github.com/ricupa/MTML-learn-how-to-adapt-to-unseen-tasks

    Adaptive reinforcement learning with active state-specific exploration for engagement maximization during simulated child-robot interaction

    Get PDF
    International audienceUsing assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact. Among relevant signals, non-verbal cues such as the child's gaze can provide the robot with important information about the child's current engagement in the task, and whether the robot should continue its current behavior or not. Here we propose a reinforcement learning algorithm extended with active state-specific exploration and show its applicability to child engagement maximization as well as more classical tasks such as maze navigation. We first demonstrate its adaptive nature on a continuous maze problem as an enhancement of the classic grid world. There, parame-terized actions enable the agent to learn single moves until the end of a corridor, similarly to "options" but without explicit hierarchical representations. We then apply the algorithm to a series of simulated scenarios, such as an extended Tower of Hanoi where the robot should find the appropriate speed of movement for the interacting child, and to a pointing task where the robot should find the child-specific appropriate level of expressivity of action. We show that the algorithm enables to cope with both global and local non-stationarities in the state space while preserving a stable behavior in other stationary portions of the state space. Altogether, these results suggest a promising way to enable robot learning based on non-verbal cues and the high degree of non-stationarities that can occur during interaction with children

    Learning to Identify Critical States for Reinforcement Learning from Videos

    Full text link
    Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first learn by itself to identify and recognize relevant states/actions/rewards. Without relying on ground-truth annotations, our new method called Deep State Identifier learns to predict returns from episodes encoded as videos. Then it uses a kind of mask-based sensitivity analysis to extract/identify important critical states. Extensive experiments showcase our method's potential for understanding and improving agent behavior. The source code and the generated datasets are available at https://github.com/AI-Initiative-KAUST/VideoRLCS.Comment: This paper was accepted to ICCV2
    corecore