Search CORE

273 research outputs found

Hierarchical Linearly-Solvable Markov Decision Problems

Author: Gómez Vicenç
Jonsson Anders
Publication venue
Publication date: 10/03/2016
Field of study

We present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal value function or task compositionality. The proposed hierarchical approach can also be seen as a novel alternative to solving LMDPs with large state spaces. We derive a hierarchical version of the so-called Z-learning algorithm that learns different tasks simultaneously and show empirically that it significantly outperforms the state-of-the-art learning methods in two classical hierarchical reinforcement learning domains: the taxi domain and an autonomous guided vehicle task.Comment: 11 pages, 6 figures, 26th International Conference on Automated Planning and Schedulin

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Deep Policies for Width-Based Planning in Pixel Domains

Author: Gómez Vicenç
Jonsson Anders
Junyent Miquel
Publication venue
Publication date: 05/07/2019
Field of study

Width-based planning has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. For example, Bandres et al. (2018) introduced a rollout version of the Iterated Width algorithm whose performance compares well with humans and learning methods in the pixel setting of the Atari games suite. In this setting, planning is done on-line using the "screen" states and selecting actions by looking ahead into the future. However, this algorithm is purely exploratory and does not leverage past reward information. Furthermore, it requires the state to be factored into features that need to be pre-defined for the particular task, e.g., the B-PROST pixel features. In this work, we extend width-based planning by incorporating an explicit policy in the action selection mechanism. Our method, called

\pi

-IW, interleaves width-based planning and policy learning using the state-actions visited by the planner. The policy estimate takes the form of a neural network and is in turn used to guide the planning step, thus reinforcing promising paths. Surprisingly, we observe that the representation learned by the neural network can be used as a feature space for the width-based planner without degrading its performance, thus removing the requirement of pre-defined features for the planner. We compare

\pi

-IW with previous width-based methods and with AlphaZero, a method that also interleaves planning and learning, in simple environments, and show that

\pi

-IW has superior performance. We also show that

\pi

-IW algorithm outperforms previous width-based methods in the pixel setting of Atari games suite.Comment: In Proceedings of the 29th International Conference on Automated Planning and Scheduling (ICAPS 2019). arXiv admin note: text overlap with arXiv:1806.0589

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Modeling the structure and evolution of discussion cascades

Author: Gómez Vicenç
Kaltenbrunner Andreas
Kappen Hilbert J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

We analyze the structure and evolution of discussion cascades in four popular websites: Slashdot, Barrapunto, Meneame and Wikipedia. Despite the big heterogeneities between these sites, a preferential attachment (PA) model with bias to the root can capture the temporal evolution of the observed trees and many of their statistical properties, namely, probability distributions of the branching factors (degrees), subtree sizes and certain correlations. The parameters of the model are learned efficiently using a novel maximum likelihood estimation scheme for PA and provide a figurative interpretation about the communication habits and the resulting discussion cascades on the four different websites.Comment: 10 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Radboud Repository

La cúpula del poder institucional laic al "castrum Terracie" (segles X-XI)

Author: Ruiz Gómez Vicenç
Publication venue: Institut Ramon Muntaner
Publication date: 01/01/2000
Field of study

Revistes Catalanes amb Accés Obert

El fons documental de l'Ajuntament de Sant Pere de Terrassa (1800-1904)

Author: Ruiz Gómez Vicenç
Publication venue: Institut Ramon Muntaner
Publication date: 01/01/2004
Field of study

Revistes Catalanes amb Accés Obert

Breu relació de la historiografia medieval sobre Terrassa

Author: Ruiz i Gómez Vicenç
Publication venue: Institut Ramon Muntaner
Publication date: 01/01/2011
Field of study

Revistes Catalanes amb Accés Obert

Quina relació hi ha entre els queviures, la fotografia i l'àlgebra de matrius?

Author: Gómez Urgellés Joan Vicenç
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 01/01/2018
Field of study

Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Revistes Catalanes amb Accés Obert

Hemeroteca Cientifica Catalana

Modelling in science education and learning

Author: Gómez Urgellés Joan Vicenç
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 01/01/2018
Field of study

Este artículo es una propuesta para la enseñanza/aprendizaje de algunos elementos de cálculo de matrices a partir del modelado matemático. De hecho, algunas situaciones cotidianas se establecen teniendo también las matrices y sus operaciones como modelo matemático, en particular mostrando cómo podemos crear modelos para ilustrar el concepto de matriz y también introduciendo operaciones básicas de diferencia y producto de matrices. En primer lugar, una matriz se muestra como un modelo matemático de una imagen y luego se discute cómo la diferencia de la matriz se convierte en un modelo para la comparación de imágenes.Sin embargo, para realizar esta tarea es necesario un software como Octave (o similar). Esta herramienta permite la búsqueda de un modelo numérico de una imagen en blanco y negro representada por una matriz. Además, vemos cómo el producto matriz es un modelo que puede deducirse naturalmente de la rutina de la compra de comestibles. La idea principal es subrayar la epistemología del cálculo matricial para reforzar el carácter cognitivo del alumno, aportando al mismo tiempo una visión contextual de lo cotidiano en la vida real, enriqueciendo lo heurístico, permitiendo la visualización de la conexión entre el simbolismo matemático (introducido en el modelo) y las situaciones realesPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

RiuNet