772 research outputs found

    On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

    Get PDF
    Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system. We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL. We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches. Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies. Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult. Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN). This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty. Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step. Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks

    Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization

    Get PDF
    Stochastic gradient-based optimization is crucial to optimize neural networks. While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information. Such methods precondition the gradient using the objective’s Hessian. Yet, computing the Hessian is usually expensive and effectively using second-order information in the stochastic gradient setting is non-trivial. We propose using Information-Theoretic Trust Region Optimization (arTuRO) for improved updates with uncertain second-order information. By modeling the network parameters as a Gaussian distribution and using a Kullback-Leibler divergence-based trust region, our approach takes bounded steps accounting for the objective’s curvature and uncertainty in the parameters. Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process. We approximate the diagonal elements of the Hessian from stochastic gradients using a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of SGD

    Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization

    Full text link
    Stochastic gradient-based optimization is crucial to optimize neural networks. While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information. Such methods precondition the gradient using the objective's Hessian. Yet, computing the Hessian is usually expensive and effectively using second-order information in the stochastic gradient setting is non-trivial. We propose using Information-Theoretic Trust Region Optimization (arTuRO) for improved updates with uncertain second-order information. By modeling the network parameters as a Gaussian distribution and using a Kullback-Leibler divergence-based trust region, our approach takes bounded steps accounting for the objective's curvature and uncertainty in the parameters. Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process. We approximate the diagonal elements of the Hessian from stochastic gradients using a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of SGD

    Hydrogen refueling station networks for heavy-duty vehicles in future power systems

    Get PDF
    A potential solution to reduce greenhouse gas (GHG) emissions in the transport sector is to use alternatively fueled vehicles (AFV). Heavy-duty vehicles (HDV) emit a large share of GHG emissions in the transport sector and are therefore the subject of growing attention from global regulators. Fuel cell and green hydrogen technologies are a promising option to decarbonize HDVs, as their fast refueling and long vehicle ranges are consistent with current logistic operational requirements. Moreover, the application of green hydrogen in transport could enable more effective integration of renewable energies (RE) across different energy sectors. This paper explores the interplay between HDV Hydrogen Refueling Stations (HRS) that produce hydrogen locally and the power system by combining an infrastructure location planning model and an electricity system optimization model that takes grid expansion options into account. Two scenarios – one sizing refueling stations to support the power system and one sizing them independently of it – are assessed regarding their impacts on the total annual electricity system costs, regional RE integration and the levelized cost of hydrogen (LCOH). The impacts are calculated based on locational marginal pricing for 2050. Depending on the integration scenario, we find average LCOH of between 4.83 euro/kg and 5.36 euro/kg, for which nodal electricity prices are the main determining factor as well as a strong difference in LCOH between north and south Germany. Adding HDV-HRS incurs power transmission expansion as well as higher power supply costs as the total power demand increases. From a system perspective, investing in HDV-HRS in symbiosis with the power system rather than independently promises cost savings of around seven billion euros per annum. We therefore conclude that the co-optimization of multiple energy sectors is important for investment planning and has the potential to exploit synergies

    A Pay-as-Bid Mechanism for Pricing Utility Computing

    Get PDF
    Encountering the increasing demand for high-performance computational resources in academic as well as commercial organisations, utility computing offers a solution by providing users with on-demand availability of requested computing services. Approaches to the fundamental issue of resource allocation include the use of technical scheduling mechanisms as well as introducing economic ideas into the allocation schemes. Technical scheduling mechanisms are often very simple (such as first-in-first-out) but suffer under the shortcoming to adequately prioritize jobs in times when demand exceeds supply. As empirical studies show, Grids (such as PlanetLab) are frequently characterized by huge excess demand for resources. This is where economic models such as markets come into play. Hitherto, market mechanisms are either (too) simple or too complex for usage in Grids. The contribution of this paper is threefold. Firstly, a mechanism for Grids is proposed, which is still simple but geared up for use in the Grid. Secondly the mechanism is embedded in state-of-the-art Grid middleware Sun N1 Grid Engine 6. Thirdly, it is shown by means of a numerical case study that this mechanism is superior to other commonly used mechanisms

    Versatile Inverse Reinforcement Learning via Cumulative Rewards

    Get PDF
    Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior

    Expected Information Maximization: Using the I-Projection for Mixture Density Estimation

    Get PDF
    Modelling highly multi-modal data is a challenging problem in machine learning. Most algorithms are based on maximizing the likelihood, which corresponds to the M(oment)-projection of the data distribution to the model distribution. The M-projection forces the model to average over modes it cannot represent. In contrast, the I(nformation)-projection ignores such modes in the data and concentrates on the modes the model can represent. Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes. Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data. In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts. Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound. We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets
    • …
    corecore