11 research outputs found

    Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

    Full text link
    Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve performance by providing a second ``baseline'' prompt and projecting out parts of the CLIP embedding space irrelevant to distinguish between goal and baseline. Further, we find a strong scaling effect for VLM-RMs: larger VLMs trained with more compute and data are better reward models. The failure modes of VLM-RMs we encountered are all related to known capability limitations of current VLMs, such as limited spatial reasoning ability or visually unrealistic environments that are far off-distribution for the VLM. We find that VLM-RMs are remarkably robust as long as the VLM is large enough. This suggests that future VLMs will become more and more useful reward models for a wide range of RL applications

    Meta-Learning via Classifier(-free) Guidance

    Full text link
    State-of-the-art meta-learning techniques do not optimize for zero-shot adaptation to unseen tasks, a setting in which humans excel. On the contrary, meta-learning algorithms learn hyperparameters and weight initializations that explicitly optimize for few-shot learning performance. In this work, we take inspiration from recent advances in generative modeling and language-conditioned image synthesis to propose meta-learning techniques that use natural language guidance to achieve higher zero-shot performance compared to the state-of-the-art. We do so by recasting the meta-learning problem as a multi-modal generative modeling problem: given a task, we consider its adapted neural network weights and its natural language description as equivalent multi-modal task representations. We first train an unconditional generative hypernetwork model to produce neural network weights; then we train a second "guidance" model that, given a natural language task description, traverses the hypernetwork latent space to find high-performance task-adapted weights in a zero-shot manner. We explore two alternative approaches for latent space guidance: "HyperCLIP"-based classifier guidance and a conditional Hypernetwork Latent Diffusion Model ("HyperLDM"), which we show to benefit from the classifier-free guidance technique common in image generation. Finally, we demonstrate that our approaches outperform existing meta-learning methods with zero-shot learning experiments on our Meta-VQA dataset, which we specifically constructed to reflect the multi-modal meta-learning setting

    Authenticated encryption of pmu data

    Get PDF
    This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible. This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible.

    Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes

    No full text
    In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we introduce DPP-Batch Bayesian Optimization (DPP-BBO), a universal framework for inducing batch diversity in sampling based BO by leveraging the repulsive properties of Determinantal Point Processes (DPP) to naturally diversify the batch sampling procedure. We illustrate this framework by formulating DPP-Thompson Sampling (DPP-TS) as a variant of the popular Thompson Sampling (TS) algorithm and introducing a Markov Chain Monte Carlo procedure to sample from it. We then prove novel Bayesian simple regret bounds for both classical batched TS as well as our counterpart DPP-TS, with the latter bound being tighter. Our real-world, as well as synthetic, experiments demonstrate improved performance of DPP-BBO over classical batching methods with Gaussian process and Cox process models.ISSN:2640-349

    Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes

    No full text
    In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we introduce DPP-Batch Bayesian Optimization (DPP-BBO), a universal framework for inducing batch diversity in sampling based BO by leveraging the repulsive properties of Determinantal Point Processes (DPP) to naturally diversify the batch sampling procedure. We illustrate this framework by formulating DPP-Thompson Sampling (DPP-TS) as a variant of the popular Thompson Sampling (TS) algorithm and introducing a Markov Chain Monte Carlo procedure to sample from it. We then prove novel Bayesian simple regret bounds for both classical batched TS as well as our counterpart DPP-TS, with the latter bound being tighter. Our real-world, as well as synthetic, experiments demonstrate improved performance of DPP-BBO over classical batching methods with Gaussian process and Cox process models.ISSN:2640-349

    Meta-Learning via Classifier(-free) Diffusion Guidance

    No full text
    We introduce meta-learning algorithms that perform zero-shot weight-space adaptation of neural network models to unseen tasks. Our methods repurpose the popular generative image synthesis techniques of natural language guidance and diffusion models to generate neural network weights adapted for tasks. We first train an unconditional generative hypernetwork model to produce neural network weights; then we train a second "guidance" model that, given a natural language task description, traverses the hypernetwork latent space to find high-performance task-adapted weights in a zero-shot manner. We explore two alternative approaches for latent space guidance: "HyperCLIP"-based classifier guidance and a conditional Hypernetwork Latent Diffusion Model ("HyperLDM"), which we show to benefit from the classifier-free guidance technique common in image generation. Finally, we demonstrate that our approaches outperform existing multi-task and meta-learning methods in a series of zero-shot learning experiments on our Meta-VQA dataset.ISSN:2835-885

    Authenticated encryption of pmu data

    No full text
    This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible. This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible.
    corecore