11 research outputs found
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Reinforcement learning (RL) requires either manually specifying a reward
function, which is often infeasible, or learning a reward model from a large
amount of human feedback, which is often very expensive. We study a more
sample-efficient alternative: using pretrained vision-language models (VLMs) as
zero-shot reward models (RMs) to specify tasks via natural language. We propose
a natural and general approach to using VLMs as reward models, which we call
VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn
complex tasks without a manually specified reward function, such as kneeling,
doing the splits, and sitting in a lotus position. For each of these tasks, we
only provide a single sentence text prompt describing the desired task with
minimal prompt engineering. We provide videos of the trained agents at:
https://sites.google.com/view/vlm-rm. We can improve performance by providing a
second ``baseline'' prompt and projecting out parts of the CLIP embedding space
irrelevant to distinguish between goal and baseline. Further, we find a strong
scaling effect for VLM-RMs: larger VLMs trained with more compute and data are
better reward models. The failure modes of VLM-RMs we encountered are all
related to known capability limitations of current VLMs, such as limited
spatial reasoning ability or visually unrealistic environments that are far
off-distribution for the VLM. We find that VLM-RMs are remarkably robust as
long as the VLM is large enough. This suggests that future VLMs will become
more and more useful reward models for a wide range of RL applications
Meta-Learning via Classifier(-free) Guidance
State-of-the-art meta-learning techniques do not optimize for zero-shot adaptation to unseen tasks, a setting in which humans excel. On the contrary, meta-learning algorithms learn hyperparameters and weight initializations that explicitly optimize for few-shot learning performance. In this work, we take inspiration from recent advances in generative modeling and language-conditioned image synthesis to propose meta-learning techniques that use natural language guidance to achieve higher zero-shot performance compared to the state-of-the-art. We do so by recasting the meta-learning problem as a multi-modal generative modeling problem: given a task, we consider its adapted neural network weights and its natural language description as equivalent multi-modal task representations. We first train an unconditional generative hypernetwork model to produce neural network weights; then we train a second "guidance" model that, given a natural language task description, traverses the hypernetwork latent space to find high-performance task-adapted weights in a zero-shot manner. We explore two alternative approaches for latent space guidance: "HyperCLIP"-based classifier guidance and a conditional Hypernetwork Latent Diffusion Model ("HyperLDM"), which we show to benefit from the classifier-free guidance technique common in image generation. Finally, we demonstrate that our approaches outperform existing meta-learning methods with zero-shot learning experiments on our Meta-VQA dataset, which we specifically constructed to reflect the multi-modal meta-learning setting
Authenticated encryption of pmu data
This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible. This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible.
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes
In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we introduce DPP-Batch Bayesian Optimization (DPP-BBO), a universal framework for inducing batch diversity in sampling based BO by leveraging the repulsive properties of Determinantal Point Processes (DPP) to naturally diversify the batch sampling procedure. We illustrate this framework by formulating DPP-Thompson Sampling (DPP-TS) as a variant of the popular Thompson Sampling (TS) algorithm and introducing a Markov Chain Monte Carlo procedure to sample from it. We then prove novel Bayesian simple regret bounds for both classical batched TS as well as our counterpart DPP-TS, with the latter bound being tighter. Our real-world, as well as synthetic, experiments demonstrate improved performance of DPP-BBO over classical batching methods with Gaussian process and Cox process models.ISSN:2640-349
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes
In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we introduce DPP-Batch Bayesian Optimization (DPP-BBO), a universal framework for inducing batch diversity in sampling based BO by leveraging the repulsive properties of Determinantal Point Processes (DPP) to naturally diversify the batch sampling procedure. We illustrate this framework by formulating DPP-Thompson Sampling (DPP-TS) as a variant of the popular Thompson Sampling (TS) algorithm and introducing a Markov Chain Monte Carlo procedure to sample from it. We then prove novel Bayesian simple regret bounds for both classical batched TS as well as our counterpart DPP-TS, with the latter bound being tighter. Our real-world, as well as synthetic, experiments demonstrate improved performance of DPP-BBO over classical batching methods with Gaussian process and Cox process models.ISSN:2640-349
Meta-Learning via Classifier(-free) Diffusion Guidance
We introduce meta-learning algorithms that perform zero-shot weight-space adaptation of neural network models to unseen tasks. Our methods repurpose the popular generative image synthesis techniques of natural language guidance and diffusion models to generate neural network weights adapted for tasks. We first train an unconditional generative hypernetwork model to produce neural network weights; then we train a second "guidance" model that, given a natural language task description, traverses the hypernetwork latent space to find high-performance task-adapted weights in a zero-shot manner. We explore two alternative approaches for latent space guidance: "HyperCLIP"-based classifier guidance and a conditional Hypernetwork Latent Diffusion Model ("HyperLDM"), which we show to benefit from the classifier-free guidance technique common in image generation. Finally, we demonstrate that our approaches outperform existing multi-task and meta-learning methods in a series of zero-shot learning experiments on our Meta-VQA dataset.ISSN:2835-885
Authenticated encryption of pmu data
This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible. This paper presents the implementation of anencryption board in order to provide confidentiality, authenticity and integrity of data collected at any point in a power grid, as a potential solution to the Smart Grid cyber security issues. This board consists of a Freescale microcontroller which enables the connection between a PMU (Phasor Measurement Unit) and a ZigBee transmitter. Encryption is done using the SHA256, HMAC-SHA256, KDF-SHA256 and AES256-CBC algorithms. This architecture makes reading and transmission of voltage and currentphasors, energy consumption, frequency, power, power factor and power outages measurements and sendsthis information in real time to a data concentrator where display and subsequent storage are possible.
Recommended from our members
Comparison of dust forecast (GEOS-5 and WRF-Chem), satellite observations and ground-based aerosol measurements in the Caribbean region during the 2020 Summer African dust season
North African dust reaches the Caribbean region every summer supplying mineral dust particles which play an important role in the regional weather and public health. During the African dust season of summer 2020 several events, including the "Godzilla" mega dust event, were identified over the Caribbean. Under the framework of the NASA-funded project Caribbean Air-quality Alert and Management Assistance System-Public Health (CALIMA-PH), we compare results of the dust forecast models with the ground-based and satellite observations for events that happened in parallel with large convective systems over the region during June-July 2020. The models used are the global dust forecast model Goddard Earth Observing System-5 (GEOS-5) and the regional dust forecast model Weather Research and Forecasting model coupled with Chemistry (WRF-Chem). Satellite observations are from the Visible Infrared Imaging Radiometer Suite (VIIRS), the Moderate Resolution Imaging Spectroradiometer (MODIS), and the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO). Ground-based observations (e.g., aerosol optical depth (AOD), depolarization ratio, particulate matter, scattering Angstrom exponent (SAE), dust surface concentration, height of dust layer) were performed at seven different locations (Cayenne, Martinique, Guadeloupe--French Territories, Barbados, Puerto Rico, Merida--Mexico and Miami--USA) over the Caribbean to provide a better understanding of African dust dispersal patterns over the region with a unique "Lagrangian" measurement, including the Godzilla mega dust event and tropical storms developed in the area. Results show that the dust forecast models were not always in agreement with the observations, and this was the particular case during the presence of tropical storms like Cristobal and Gonzalo. We will show the differences between the forecast provided by both models and the result of another run after ingesting the models with aerosol available data such as AOD
Recommended from our members
"Godzilla" African dust event of June 2020; impacts of air quality in the Greater Caribbean Basin, the Gulf of Mexico and the United States
On June 19, 2020, the Caribbean region started to feel the effects of an historic African (Saharan) dust plume that has been called "Godzilla" due to its large geographic extent and record amount of dust. This plume, with an area close to the size of the continental USA (8,080,464 km (super 2) ), blanketed areas in the greater Caribbean Basin, the Gulf of Mexico and the southern United States. The occurrence and progression of this "Godzilla" event was predicted by several dust forecast models, among them, the global Goddard Earth Observing System-5 (GEOS-5) and the regional dust forecast model Weather Research and Forecasting model coupled with Chemistry (WRF-Chem). According to data from the NASA Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP Lidar), the dust plume extended from the Earth's surface up to about 5 km altitude. As part of the NASA-funded summer 2020 intensive field phase of the Caribbean Air-quality Alert and Management Assistance System-Public Health (CALIMA-PH) project, eight ground-based stations in the Greater Caribbean Basin (French Guiana, Trinidad and Tobago, Martinique, Guadeloupe, Puerto Rico, Merida-Mexico and Miami-USA) collected surface aerosol data (e.g., PM (sub 10) and PM (sub 2.5) mass concentrations, light scattering and absorption coefficients, visibility, dust concentrations) and column aerosol data (i.e., aerosol optical depth--AOD) during the event. Using these data, together with satellite observations from the Moderate Resolution Imaging Spectroradiometer (MODIS), the Visible Infrared Imaging Radiometer Suite (VIIRS), and CALIOP, we describe the movement of the dust plume through the region and assess its impact. The event caused a decrease in visibility in the atmosphere's boundary layer of less than 3 miles in some locations, showed record values for the aerosol optical properties, and exhibited exceedances in both the US EPA air quality standard and the World Health Organization (WHO) air quality guidelines. For several days, the locations impacted by the "Godzilla" dust plume were exposed to air quality conditions ranging from "Unhealthy for sensitive groups" to "Hazardous", in cases reaching PM (sub 10) values ca. 500 mu g/m (super 3)