2,795 research outputs found
Towards safe reinforcement-learning in industrial grid-warehousing
publishedVersionHybri
Representation Learning in Deep RL via Discrete Information Bottleneck
Several self-supervised representation learning methods have been proposed
for reinforcement learning (RL) with rich observations. For real-world
applications of RL, recovering underlying latent states is crucial,
particularly when sensory inputs contain irrelevant and exogenous information.
In this work, we study how information bottlenecks can be used to construct
latent states efficiently in the presence of task-irrelevant information. We
propose architectures that utilize variational and discrete information
bottlenecks, coined as RepDIB, to learn structured factorized representations.
Exploiting the expressiveness bought by factorized representations, we
introduce a simple, yet effective, bottleneck that can be integrated with any
existing self-supervised objective for RL. We demonstrate this across several
online and offline RL benchmarks, along with a real robot arm task, where we
find that compressed representations with RepDIB can lead to strong performance
improvements, as the learned bottlenecks help predict only the relevant state
while ignoring irrelevant information
Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration
Intrinsically motivated goal exploration algorithms enable machines to
discover repertoires of policies that produce a diversity of effects in complex
environments. These exploration algorithms have been shown to allow real world
robots to acquire skills such as tool use in high-dimensional continuous state
and action spaces. However, they have so far assumed that self-generated goals
are sampled in a specifically engineered feature space, limiting their
autonomy. In this work, we propose to use deep representation learning
algorithms to learn an adequate goal space. This is a developmental 2-stage
approach: first, in a perceptual learning stage, deep learning algorithms use
passive raw sensor observations of world changes to learn a corresponding
latent space; then goal exploration happens in a second stage by sampling goals
in this latent space. We present experiments where a simulated robot arm
interacts with an object, and we show that exploration algorithms using such
learned representations can match the performance obtained using engineered
representations
TALK COMMONSENSE TO ME! ENRICHING LANGUAGE MODELS WITH COMMONSENSE KNOWLEDGE
Human cognition is exciting, it is a mesh up of several neural phenomena which really
strive our ability to constantly reason and infer about the involving world. In cognitive
computer science, Commonsense Reasoning is the terminology given to our ability to
infer uncertain events and reason about Cognitive Knowledge. The introduction of Commonsense
to intelligent systems has been for years desired, but the mechanism for this
introduction remains a scientific jigsaw. Some, implicitly believe language understanding
is enough to achieve some level of Commonsense [90]. In a less common ground, there
are others who think enriching language with Knowledge Graphs might be enough for
human-like reasoning [63], while there are others who believe human-like reasoning can
only be truly captured with symbolic rules and logical deduction powered by Knowledge
Bases, such as taxonomies and ontologies [50]. We focus on Commonsense Knowledge
integration to Language Models, because we believe that this integration is a step towards
a beneficial embedding of Commonsense Reasoning to interactive Intelligent Systems,
such as conversational assistants.
Conversational assistants, such as Alexa from Amazon, are user driven systems. Thus,
giving birth to a more human-like interaction is strongly desired to really capture the
user’s attention and empathy. We believe that such humanistic characteristics can be
leveraged through the introduction of stronger Commonsense Knowledge and Reasoning
to fruitfully engage with users.
To this end, we intend to introduce a new family of models, the Relation-Aware
BART (RA-BART), leveraging language generation abilities of BART [51] with explicit
Commonsense Knowledge extracted from Commonsense Knowledge Graphs to further
extend human capabilities on these models.
We evaluate our model on three different tasks: Abstractive Question Answering, Text
Generation conditioned on certain concepts and aMulti-Choice Question Answering task.
We find out that, on generation tasks, RA-BART outperforms non-knowledge enriched
models, however, it underperforms on the multi-choice question answering task.
Our Project can be consulted in our open source, public GitHub repository (Explicit
Commonsense).A cognição humana é entusiasmante, é uma malha de vários fenómenos neuronais que
nos estimulam vivamente a capacidade de raciocinar e inferir constantemente sobre o
mundo envolvente. Na ciência cognitiva computacional, o raciocínio de senso comum é
a terminologia dada à nossa capacidade de inquirir sobre acontecimentos incertos e de
raciocinar sobre o conhecimento cognitivo. A introdução do senso comum nos sistemas
inteligentes é desejada há anos, mas o mecanismo para esta introdução continua a ser
um quebra-cabeças científico. Alguns acreditam que apenas compreensão da linguagem
é suficiente para alcançar o senso comum [90], num campo menos similar há outros que
pensam que enriquecendo a linguagem com gráfos de conhecimento pode serum caminho
para obter um raciocínio mais semelhante ao ser humano [63], enquanto que há outros
ciêntistas que acreditam que o raciocínio humano só pode ser verdadeiramente capturado
com regras simbólicas e deduções lógicas alimentadas por bases de conhecimento, como
taxonomias e ontologias [50]. Concentramo-nos na integração de conhecimento de censo
comum em Modelos Linguísticos, acreditando que esta integração é um passo no sentido
de uma incorporação benéfica no racíocinio de senso comum em Sistemas Inteligentes
Interactivos, como é o caso dos assistentes de conversação.
Assistentes de conversação, como o Alexa da Amazon, são sistemas orientados aos
utilizadores. Assim, dar origem a uma comunicação mais humana é fortemente desejada
para captar realmente a atenção e a empatia do utilizador. Acreditamos que tais características
humanísticas podem ser alavancadas por meio de uma introdução mais rica de
conhecimento e raciocínio de senso comum de forma a proporcionar uma interação mais
natural com o utilizador.
Para tal, pretendemos introduzir uma nova família de modelos, o Relation-Aware
BART (RA-BART), alavancando as capacidades de geração de linguagem do BART [51]
com conhecimento de censo comum extraído a partir de grafos de conhecimento explícito
de senso comum para alargar ainda mais as capacidades humanas nestes modelos.
Avaliamos o nosso modelo em três tarefas distintas: Respostas a Perguntas Abstratas,
Geração de Texto com base em conceitos e numa tarefa de Resposta a Perguntas de Escolha Múltipla . Descobrimos que, nas tarefas de geração, o RA-BART tem um desempenho
superior aos modelos sem enriquecimento de conhecimento, contudo, tem um
desempenho inferior na tarefa de resposta a perguntas de múltipla escolha.
O nosso Projecto pode ser consultado no nosso repositório GitHub público, de código
aberto (Explicit Commonsense)
Neural combinatorial optimization as an enabler technology to design real-time virtual network function placement decision systems
158 p.The Fifth Generation of the mobile network (5G) represents a breakthrough technology for thetelecommunications industry. 5G provides a unified infrastructure capable of integrating over thesame physical network heterogeneous services with different requirements. This is achieved thanksto the recent advances in network virtualization, specifically in Network Function Virtualization(NFV) and Software Defining Networks (SDN) technologies. This cloud-based architecture not onlybrings new possibilities to vertical sectors but also entails new challenges that have to be solvedaccordingly. In this sense, it enables to automate operations within the infrastructure, allowing toperform network optimization at operational time (e.g., spectrum optimization, service optimization,traffic optimization). Nevertheless, designing optimization algorithms for this purpose entails somedifficulties. Solving the underlying Combinatorial Optimization (CO) problems that these problemspresent is usually intractable due to their NP-Hard nature. In addition, solutions to these problems arerequired in close to real-time due to the tight time requirements on this dynamic environment. Forthis reason, handwritten heuristic algorithms have been widely used in the literature for achievingfast approximate solutions on this context.However, particularizing heuristics to address CO problems can be a daunting task that requiresexpertise. The ability to automate this resolution processes would be of utmost importance forachieving an intelligent network orchestration. In this sense, Artificial Intelligence (AI) is envisionedas the key technology for autonomously inferring intelligent solutions to these problems. Combining AI with network virtualization can truly transform this industry. Particularly, this Thesis aims at using Neural Combinatorial Optimization (NCO) for inferring endsolutions on CO problems. NCO has proven to be able to learn near optimal solutions on classicalcombinatorial problems (e.g., the Traveler Salesman Problem (TSP), Bin Packing Problem (BPP),Vehicle Routing Problem (VRP)). Specifically, NCO relies on Reinforcement Learning (RL) toestimate a Neural Network (NN) model that describes the relation between the space of instances ofthe problem and the solutions for each of them. In other words, this model for a new instance is ableto infer a solution generalizing from the problem space where it has been trained. To this end, duringthe learning process the model takes instances from the learning space, and uses the reward obtainedfrom evaluating the solution to improve its accuracy.The work here presented, contributes to the NCO theory in two main directions. First, this workargues that the performance obtained by sequence-to-sequence models used for NCO in the literatureis improved presenting combinatorial problems as Constrained Markov Decision Processes (CMDP).Such property can be exploited for building a Markovian model that constructs solutionsincrementally based on interactions with the problem. And second, this formulation enables toaddress general constrained combinatorial problems under this framework. In this context, the modelin addition to the reward signal, relies on penalty signals generated from constraint dissatisfactionthat direct the model toward a competitive policy even in highly constrained environments. Thisstrategy allows to extend the number of problems that can be addressed using this technology.The presented approach is validated in the scope of intelligent network management, specifically inthe Virtual Network Function (VNF) placement problem. This problem consists of efficientlymapping a set of network service requests on top of the physical network infrastructure. Particularly,we seek to obtain the optimal placement for a network service chain considering the state of thevirtual environment, so that a specific resource objective is accomplished, in this case theminimization of the overall power consumption. Conducted experiments prove the capability of theproposal for learning competitive solutions when compared to classical heuristic, metaheuristic, andConstraint Programming (CP) solvers
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments,
discover possible interactions and build repertoires of skills is a general
objective of artificial intelligence. Developmental approaches argue that this
can only be achieved by : intrinsically motivated learning
agents that can learn to represent, generate, select and solve their own
problems. In recent years, the convergence of developmental approaches with
deep reinforcement learning (RL) methods has been leading to the emergence of a
new field: . Developmental RL is
concerned with the use of deep RL algorithms to tackle a developmental problem
-- the -
. The self-generation of goals requires the learning
of compact goal encodings as well as their associated goal-achievement
functions. This raises new challenges compared to standard RL algorithms
originally designed to tackle pre-defined sets of goals using external reward
signals. The present paper introduces developmental RL and proposes a
computational framework based on goal-conditioned RL to tackle the
intrinsically motivated skills acquisition problem. It proceeds to present a
typology of the various goal representations used in the literature, before
reviewing existing methods to learn to represent and prioritize goals in
autonomous systems. We finally close the paper by discussing some open
challenges in the quest of intrinsically motivated skills acquisition
Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data
The growing interest in language-conditioned robot manipulation aims to
develop robots capable of understanding and executing complex tasks, with the
objective of enabling robots to interpret language commands and manipulate
objects accordingly. While language-conditioned approaches demonstrate
impressive capabilities for addressing tasks in familiar environments, they
encounter limitations in adapting to unfamiliar environment settings. In this
study, we propose a general-purpose, language-conditioned approach that
combines base skill priors and imitation learning under unstructured data to
enhance the algorithm's generalization in adapting to unfamiliar environments.
We assess our model's performance in both simulated and real-world environments
using a zero-shot setting. In the simulated environment, the proposed approach
surpasses previously reported scores for CALVIN benchmark, especially in the
challenging Zero-Shot Multi-Environment setting. The average completed task
length, indicating the average number of tasks the agent can continuously
complete, improves more than 2.5 times compared to the state-of-the-art method
HULC. In addition, we conduct a zero-shot evaluation of our policy in a
real-world setting, following training exclusively in simulated environments
without additional specific adaptations. In this evaluation, we set up ten
tasks and achieved an average 30% improvement in our approach compared to the
current state-of-the-art approach, demonstrating a high generalization
capability in both simulated environments and the real world. For further
details, including access to our code and videos, please refer to our
supplementary materials
- …