13 research outputs found
Multispectral Contrastive Learning with Viewmaker Networks
Contrastive learning methods have been applied to a range of domains and
modalities by training models to identify similar "views" of data points.
However, specialized scientific modalities pose a challenge for this paradigm,
as identifying good views for each scientific instrument is complex and
time-intensive. In this paper, we focus on applying contrastive learning
approaches to a variety of remote sensing datasets. We show that Viewmaker
networks, a recently proposed method for generating views, are promising for
producing views in this setting without requiring extensive domain knowledge
and trial and error. We apply Viewmaker to four multispectral imaging problems,
each with a different format, finding that Viewmaker can outperform cropping-
and reflection-based methods for contrastive learning in every case when
evaluated on downstream classification tasks. This provides additional evidence
that domain-agnostic methods can empower contrastive learning to scale to
real-world scientific domains. Open source code can be found at
https://github.com/jbayrooti/divmaker.Comment: Appearing in CVPR-PBVS 202
Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies
When we transfer a pretrained language model to a new language, there are
many axes of variation that change at once. To disentangle the impact of
different factors like syntactic similarity and vocabulary similarity, we
propose a set of controlled transfer studies: we systematically transform the
language of the GLUE benchmark, altering one axis of crosslingual variation at
a time, and then measure the resulting drops in a pretrained model's downstream
performance. We find that models can largely recover from syntactic-style
shifts, but cannot recover from vocabulary misalignment and embedding matrix
re-initialization, even with continued pretraining on 15 million tokens. %On
the other hand, transferring to a dataset with an unaligned vocabulary is
extremely hard to recover from in the low-data regime. Moreover, good-quality
tokenizers in the transfer language do not make vocabulary alignment easier.
Our experiments provide insights into the factors of cross-lingual transfer
that researchers should most focus on when designing language transfer
scenarios.Comment: EMNLP 202
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Understanding neural networks is challenging in part because of the dense,
continuous nature of their hidden states. We explore whether we can train
neural networks to have hidden states that are sparse, discrete, and more
interpretable by quantizing their continuous features into what we call
codebook features. Codebook features are produced by finetuning neural networks
with vector quantization bottlenecks at each layer, producing a network whose
hidden features are the sum of a small number of discrete vector codes chosen
from a larger codebook. Surprisingly, we find that neural networks can operate
under this extreme bottleneck with only modest degradation in performance. This
sparse, discrete bottleneck also provides an intuitive way of controlling
neural network behavior: first, find codes that activate when the desired
behavior is present, then activate those same codes during generation to elicit
that behavior. We validate our approach by training codebook Transformers on
several different datasets. First, we explore a finite state machine dataset
with far more hidden states than neurons. In this setting, our approach
overcomes the superposition problem by assigning states to distinct codes, and
we find that we can make the neural network behave as if it is in a different
state by activating the code for that state. Second, we train Transformer
language models with up to 410M parameters on two natural language datasets. We
identify codes in these models representing diverse, disentangled concepts
(ranging from negative emotions to months of the year) and find that we can
guide the model to generate different topics by activating the appropriate
codes during inference. Overall, codebook features appear to be a promising
unit of analysis and control for neural networks and interpretability. Our
codebase and models are open-sourced at
https://github.com/taufeeque9/codebook-features
Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
While maximizing expected return is the goal in most reinforcement learning
approaches, risk-sensitive objectives such as conditional value at risk (CVaR)
are more suitable for many high-stakes applications. However, relatively little
is known about how to explore to quickly learn policies with good CVaR. In this
paper, we present the first algorithm for sample-efficient learning of
CVaR-optimal policies in Markov decision processes based on the optimism in the
face of uncertainty principle. This method relies on a novel optimistic version
of the distributional Bellman operator that moves probability mass from the
lower to the upper tail of the return distribution. We prove asymptotic
convergence and optimism of this operator for the tabular policy evaluation
case. We further demonstrate that our algorithm finds CVaR-optimal policies
substantially faster than existing baselines in several simulated environments
with discrete and continuous state spaces
Eliciting Human Preferences with Language Models
Language models (LMs) can be directed to perform target tasks by using
labeled examples or natural language prompts. But selecting examples or writing
prompts for can be challenging--especially in tasks that involve unusual edge
cases, demand precise articulation of nebulous preferences, or require an
accurate mental model of LM behavior. We propose to use *LMs themselves* to
guide the task specification process. In this paper, we introduce **Generative
Active Task Elicitation (GATE)**: a learning framework in which models elicit
and infer intended behavior through free-form, language-based interaction with
users. We study GATE in three domains: email validation, content
recommendation, and moral reasoning. In preregistered experiments, we show that
LMs prompted to perform GATE (e.g., by generating open-ended questions or
synthesizing informative edge cases) elicit responses that are often more
informative than user-written prompts or labels. Users report that interactive
task elicitation requires less effort than prompting or example labeling and
surfaces novel considerations not initially anticipated by users. Our findings
suggest that LM-driven elicitation can be a powerful tool for aligning models
to complex human preferences and values.Comment: 26 pages, 15 figure
Social Contract AI: Aligning AI Assistants with Implicit Group Norms
We explore the idea of aligning an AI assistant by inverting a model of
users' (unknown) preferences from observed interactions. To validate our
proposal, we run proof-of-concept simulations in the economic ultimatum game,
formalizing user preferences as policies that guide the actions of simulated
players. We find that the AI assistant accurately aligns its behavior to match
standard policies from the economic literature (e.g., selfish, altruistic).
However, the assistant's learned policies lack robustness and exhibit limited
generalization in an out-of-distribution setting when confronted with a
currency (e.g., grams of medicine) that was not included in the assistant's
training distribution. Additionally, we find that when there is inconsistency
in the relationship between language use and an unknown policy (e.g., an
altruistic policy combined with rude language), the assistant's learning of the
policy is slowed. Overall, our preliminary results suggest that developing
simulation frameworks in which AI assistants need to infer preferences from
diverse users can provide a valuable approach for studying practical alignment
questions.Comment: SoLaR NeurIPS 2023 Workshop (https://solar-neurips.github.io/
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
Analysis of compressible turbulent flows is essential for applications
related to propulsion, energy generation, and the environment. Here, we present
BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples
from 34 high-fidelity direct numerical simulations, which addresses the current
limited availability of 3D high-fidelity reacting and non-reacting compressible
turbulent flow simulation data. With this data, we benchmark a total of 49
variations of five deep learning approaches for 3D super-resolution - which can
be applied for improving scientific imaging, simulations, turbulence models, as
well as in computer vision applications. We perform neural scaling analysis on
these models to examine the performance of different machine learning (ML)
approaches, including two scientific ML techniques. We demonstrate that (i)
predictive performance can scale with model size and cost, (ii) architecture
matters significantly, especially for smaller models, and (iii) the benefits of
physics-based losses can persist with increasing model size. The outcomes of
this benchmark study are anticipated to offer insights that can aid the design
of 3D super-resolution models, especially for turbulence models, while this
data is expected to foster ML methods for a broad range of flow physics
applications. This data is publicly available with download links and browsing
tools consolidated at https://blastnet.github.io.Comment: Accepted in Advances in Neural Information Processing Systems 36
(NeurIPS 2023). 55 pages, 21 figures. v2: Corrected co-author name. Keywords:
Super-resolution, 3D, Neural Scaling, Physics-informed Loss, Computational
Fluid Dynamics, Partial Differential Equations, Turbulent Reacting Flows,
Direct Numerical Simulation, Fluid Mechanics, Combustio
Generative Active Learning
We study the ability of language models to clarify the underspecified intent of users by generating informative questions and/or edge cases. Users are asked to respond to these queries based on their preferences. These question-answer interactions are subsequently used to prompt the language model to perform the task on held-out samples. We will evaluate the benefits of this method over existing approaches, as detailed in the rest of this document