8,861 research outputs found
Recommended from our members
Learning Parameterized Skills
One of the defining characteristics of human intelligence is the ability to acquire and refine skills. Skills are behaviors for solving problems that an agent encounters often—sometimes in different contexts and situations—throughout its lifetime. Identifying important problems that recur and retaining their solutions as skills allows agents to more rapidly solve novel problems by adjusting and combining their existing skills.
In this thesis we introduce a general framework for learning reusable parameterized skills. Reusable skills are parameterized procedures that—given a description of a problem to be solved—produce appropriate behaviors or policies. They can be sequentially and hierarchically combined with other skills to produce progressively more abstract and temporally extended behaviors.
We identify three major challenges involved in the construction of such skills. First, an agent should be capable of solving a small number of problems and generalizing these experiences to construct a single reusable skill. The skill should be capable of producing appropriate behaviors even when applied to yet unseen variations of a problem. We introduce a method for estimating properties of the lower-dimensional manifold on which problem solutions lie. This allows for the construction of unified models for predicting policies from task parameters.
Secondly, the agent should be able to identify when a skill can be hierarchically decomposed into specialized sub-skills. We observe that the policy manifold may be composed of disjoint, piecewise-smooth charts, each one encoding solutions for a subclass of problems. Identifying and modeling sub-skills allows for the aggregation of related behaviors into single, more abstract skills.
Finally, the agent should be able to actively select on which problems to practice in order to more rapidly become competent in a skill. Thoughtful and deliberate practice is one of the defining characteristics of human expert performance. By carefully choosing on which problems to practice the agent might more rapidly construct a skill that performs well over a wide range of problems.
We address these challenges via a general framework for skill acquisition. We evaluate it on simulated decision-problems and on a physical humanoid robot, and demonstrate that it allows for the efficient and active construction of reusable skills
Sistema de controle de um VANT
Trabalho de conclusão de curso (graduação)—Universidade de BrasÃlia, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2016.Este projeto apresenta o desenvolvimento de um sistema de controle de um veÃculo aéreo não tripulado – VANT implementado para funcionar em um smartphone de sistema operacional Android. A proposta é viabilizar o controle por meio de dispositivos móveis, possibilitando ao usuário decidir por controlar o veÃculo pela touchscreen ou utilizando o giroscópio do dispositivo. Um aplicativo para a plataforma Android é a interface utilizada pelo usuário e foi desenvolvido pelos autores utilizando o software Android Studio. Uma placa Arduino Uno realiza todo o controle do sistema; nela são gerados quatro sinais PWM que, tratados com um filtro passa-baixas, substituem os potenciômetros dos joysticks do controle original. O aplicativo se comunica com a placa por um sistema de comunicação Bluetooth, que permite ao usuário pilotar o drone. Mais funções podem ser acrescidas ao aplicativo, de forma a utilizar o VANT em trabalhos futuros.This paper presents the development of a control system for an unmanned aerial vehicle – UAV implemented to work on an Android operating system smartphone. The proposal is to enable its control from mobile devices, making it possible to the user to choose to control the vehicle on the touchscreen or using the gyroscope sensor of the device. An application on Android platform is the interface used by the user and it was developed by the authors of this work on the software Android Studio. An Arduino Uno board executes the control of the system, where four PWM signals are generated and then treated in a low pass filter, replacing the potentiometers of the original control joysticks. The application communicates with the board through a Bluetooth communication system, which allows the user to pilote the drone. It is also possible to add more functions to the app in order to use the UAV in future projects
Behavior Alignment via Reward Function Optimization
Designing reward functions for efficiently guiding reinforcement learning
(RL) agents toward specific behaviors is a complex task. This is challenging
since it requires the identification of reward structures that are not sparse
and that avoid inadvertently inducing undesirable behaviors. Naively modifying
the reward structure to offer denser and more frequent feedback can lead to
unintended outcomes and promote behaviors that are not aligned with the
designer's intended goal. Although potential-based reward shaping is often
suggested as a remedy, we systematically investigate settings where deploying
it often significantly impairs performance. To address these issues, we
introduce a new framework that uses a bi-level objective to learn
\emph{behavior alignment reward functions}. These functions integrate auxiliary
rewards reflecting a designer's heuristics and domain knowledge with the
environment's primary rewards. Our approach automatically determines the most
effective way to blend these types of feedback, thereby enhancing robustness
against heuristic reward misspecification. Remarkably, it can also adapt an
agent's policy optimization process to mitigate suboptimalities resulting from
limitations and biases inherent in the underlying RL algorithms. We evaluate
our method's efficacy on a diverse set of tasks, from small-scale experiments
to high-dimensional control challenges. We investigate heuristic auxiliary
rewards of varying quality -- some of which are beneficial and others
detrimental to the learning process. Our results show that our framework offers
a robust and principled way to integrate designer-specified heuristics. It not
only addresses key shortcomings of existing approaches but also consistently
leads to high-performing solutions, even when given misaligned or
poorly-specified auxiliary reward functions.Comment: (Spotlight) Thirty-seventh Conference on Neural Information
Processing Systems (NeurIPS 2023
A governança do arranjo produtivo local de vestuário de Muriaé-MG a partir de um modelo bidimensional de análise
The aim of this study was to understand governance in Muriaé’s (MG) clothing LPA. A qualitative study of descriptive character, supported by case study method, was performed. Additionally, an analytical model for analysis of governance in LPA was proposed based on the following categories: representation, cooperation and coordination. The primary qualitative data were analyzed with support of the NVIVO® software, using the content analysis technique. Governance in the clothing LPA of Muriaé-MG is characterized by expressive representation of the actors, the incipient cooperation and coordination of the companies and the significant operations of most entities that are committed to the development of said LPA. This study expanded the understanding of the theme of governance in LPA due to its detailed consideration of the criteria used for their understanding, regarding the procedures that can be employed to conduct an investigation, which may be adopted in future researches.O objetivo deste estudo foi compreender a governança no APL de Vestuário de Muriaé-MG. Foi realizada uma pesquisa qualitativa de caráter descritivo, amparada pelo método do estudo de caso. Adicionalmente, foi proposto um modelo analÃtico para a análise da governança em APL a partir das categorias: representatividade, cooperação e coordenação. Os dados qualitativos primários foram analisados com suporte do software NVIVO®, utilizando-se da técnica de análise de conteúdo. A governança no APL de Vestuário de Muriaé-MG é caracterizada pela representatividade expressiva dos atores, pela incipiência da cooperação e coordenação das empresas e pela atuação expressiva da maioria das entidades que estão comprometidas com o desenvolvimento do referido APL. Este estudo ampliou a compreensão do tema governança em APL pela indicação pormenorizada de critérios para sua compreensão no que tange aos procedimentos utilizados para conduzir uma investigação, os quais poderão ser adotados em futuras pesquisas
A task-and-technique centered survey on visual analytics for deep learning model engineering
Although deep neural networks have achieved state-of-the-art performance in several artificial intelligence applications in the past decade, they are still hard to understand. In particular, the features learned by deep networks when determining whether a given input belongs to a specific class are only implicitly described concerning a considerable number of internal model parameters. This makes it harder to construct interpretable hypotheses of what the network is learning and how it is learning both of which are essential when designing and improving a deep model to tackle a particular learning task. This challenge can be addressed by the use of visualization tools that allow machine learning experts to explore which components of a network are learning useful features for a pattern recognition task, and also to identify characteristics of the network that can be changed to improve its performance. We present a review of modern approaches aiming to use visual analytics and information visualization techniques to understand, interpret, and fine-tune deep learning models. For this, we propose a taxonomy of such approaches based on whether they provide tools for visualizing a network's architecture, to facilitate the interpretation and analysis of the training process, or to allow for feature understanding. Next, we detail how these approaches tackle the tasks above for three common deep architectures: deep feedforward networks, convolutional neural networks, and recurrent neural networks. Additionally, we discuss the challenges faced by each network architecture and outline promising topics for future research in visualization techniques for deep learning models. (C) 2018 Elsevier Ltd. All rights reserved.</p
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a higher-order stationarity assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy’s past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity
- …