2,496 research outputs found

    Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

    Get PDF
    We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The setup naturally leads to novel theoretical results which are simply formulated due to the max-plus algebra structure. For example, when considering a fixed (non adaptive) finite basis, the computational complexity of approximating the optimal value function is not directly related to the number of states, but to notions of covering numbers of the state space. In order to break the curse of dimensionality in factored state-spaces, we consider adaptive basis that can adapt to particular problems leading to an algorithm similar to matching pursuit from signal processing. They currently come with no theoretical guarantees but work empirically well on simple deterministic MDPs derived from low-dimensional continuous control problems. We focus primarily on deterministic MDPs but note that the framework can be applied to all MDPs by considering measure-based formulations

    Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

    Get PDF
    We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The setup naturally leads to novel theoretical results which are simply formulated due to the max-plus algebra structure. For example, when considering a fixed (non adaptive) finite basis, the computational complexity of approximating the optimal value function is not directly related to the number of states, but to notions of covering numbers of the state space. In order to break the curse of dimensionality in factored state-spaces, we consider adaptive basis that can adapt to particular problems leading to an algorithm similar to matching pursuit from signal processing. They currently come with no theoretical guarantees but work empirically well on simple deterministic MDPs derived from low-dimensional continuous control problems. We focus primarily on deterministic MDPs but note that the framework can be applied to all MDPs by considering measure-based formulations

    Max-Plus Linear Approximations for Deterministic Continuous-State Markov Decision Processes

    Get PDF
    International audienceWe consider deterministic continuous-state Markov decision processes (MDPs). We apply a max-plus linear method to approximate the value function with a specific dictionary of functions that leads to an adequate state-discretization of the MDP. This is more efficient than a direct discretization of the state space, typically intractable in high dimension. We propose a simple strategy to adapt the discretization to a problem instance, thus mitigating the curse of dimensionality. We provide numerical examples showing that the method works well on simple MDPs

    Regularity Properties for Sparse Regression

    Full text link
    Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and â„“q\ell_q sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given data set. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, â„“q\ell_q sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.Comment: Manuscript shortened and more motivation added. To appear in Communications in Mathematics and Statistic

    On inverse reinforcement learning and dynamic discrete choice for predicting path choices

    Full text link
    La modélisation du choix d'itinéraire est un sujet de recherche bien étudié avec des implications, par exemple, pour la planification urbaine et l'analyse des flux d'équilibre du trafic. En raison de l'ampleur des effets que ces problèmes peuvent avoir sur les communautés, il n'est pas surprenant que plusieurs domaines de recherche aient tenté de résoudre le même problème. Les défis viennent cependant de la taille des réseaux eux-mêmes, car les grandes villes peuvent avoir des dizaines de milliers de segments de routes reliés par des dizaines de milliers d'intersections. Ainsi, les approches discutées dans cette thèse se concentreront sur la comparaison des performances entre des modèles de deux domaines différents, l'économétrie et l'apprentissage par renforcement inverse (IRL). Tout d'abord, nous fournissons des informations sur le sujet pour que des chercheurs d'un domaine puissent se familiariser avec l'autre domaine. Dans un deuxième temps, nous décrivons les algorithmes utilisés avec une notation commune, ce qui facilite la compréhension entre les domaines. Enfin, nous comparons les performances des modèles sur des ensembles de données du monde réel, à savoir un ensemble de données couvrant des choix d’itinéraire de cyclistes collectés dans un réseau avec 42 000 liens. Nous rapportons nos résultats pour les deux modèles de l'économétrie que nous discutons, mais nous n'avons pas pu générer les mêmes résultats pour les deux modèles IRL. Cela était principalement dû aux instabilités numériques que nous avons rencontrées avec le code que nous avions modifié pour fonctionner avec nos données. Nous proposons une discussion de ces difficultés parallèlement à la communication de nos résultats.Route choice modeling is a well-studied topic of research with implications, for example, for city planning and traffic equilibrium flow analysis. Due to the scale of effects these problems can have on communities, it is no surprise that diverse fields have attempted solutions to the same problem. The challenges, however, come with the size of networks themselves, as large cities may have tens of thousands of road segments connected by tens of thousands of intersections. Thus, the approaches discussed in this thesis will be focusing on the performance comparison between models from two different fields, econometrics and inverse reinforcement learning (IRL). First, we provide background on the topic to introduce researchers from one field to become acquainted with the other. Secondly, we describe the algorithms used with a common notation to facilitate this building of understanding between the fields. Lastly, we aim to compare the performance of the models on real-world datasets, namely covering bike route choices collected in a network of 42,000 links. We report our results for the two models from econometrics that we discuss, but were unable to generate the same results for the two IRL models. This was primarily due to numerical instabilities we encountered with the code we had modified to work with our data. We provide a discussion of these difficulties alongside the reporting of our results
    • …
    corecore