13 research outputs found
A Surprising Failure? Multimodal LLMs and the NLVR Challenge
This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and
the open-source model IDEFICS -- on the compositional natural language vision
reasoning task NLVR. Given a human-written sentence paired with a synthetic
image, this task requires the model to determine the truth value of the
sentence with respect to the image. Despite the strong performance demonstrated
by these models, we observe they perform poorly on NLVR, which was constructed
to require compositional and spatial reasoning, and to be robust for semantic
and systematic biases
Reviewer2: Optimizing Review Generation Through Prompt Generation
Recent developments in LLMs offer new opportunities for assisting authors in
improving their work. In this paper, we envision a use case where authors can
receive LLM-generated reviews that uncover weak points in the current draft.
While initial methods for automated review generation already exist, these
methods tend to produce reviews that lack detail, and they do not cover the
range of opinions that human reviewers produce. To address this shortcoming, we
propose an efficient two-stage review generation framework called Reviewer2.
Unlike prior work, this approach explicitly models the distribution of possible
aspects that the review may address. We show that this leads to more detailed
reviews that better cover the range of aspects that human reviewers identify in
the draft. As part of the research, we generate a large-scale review dataset of
27k papers and 99k reviews that we annotate with aspect prompts, which we make
available as a resource for future research
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
We present lilGym, a new benchmark for language-conditioned reinforcement
learning in visual environments. lilGym is based on 2,661 highly-compositional
human-written natural language statements grounded in an interactive visual
environment. We introduce a new approach for exact reward computation in every
possible world state by annotating all statements with executable Python
programs. Each statement is paired with multiple start states and reward
functions to form thousands of distinct Markov Decision Processes of varying
difficulty. We experiment with lilGym with different models and learning
regimes. Our results and analysis show that while existing methods are able to
achieve non-trivial performance, lilGym forms a challenging open problem.
lilGym is available at https://lil.nlp.cornell.edu/lilgym/.Comment: ACL 2023 Long Pape
Successor Feature Sets: Generalizing Successor Representations Across Policies
Successor-style representations have many advantages for reinforcement
learning: for example, they can help an agent generalize from past experience
to new goals, and they have been proposed as explanations of behavioral and
neural data from human and animal learners. They also form a natural bridge
between model-based and model-free RL methods: like the former they make
predictions about future experiences, and like the latter they allow efficient
prediction of total discounted rewards. However, successor-style
representations are not optimized to generalize across policies: typically, we
maintain a limited-length list of policies, and share information among them by
representation learning or GPI. Successor-style representations also typically
make no provision for gathering information or reasoning about latent
variables. To address these limitations, we bring together ideas from
predictive state representations, belief space value iteration, successor
features, and convex analysis: we develop a new, general successor-style
representation, together with a Bellman equation that connects multiple sources
of information within this representation, including different latent states,
policies, and reward functions. The new representation is highly expressive:
for example, it lets us efficiently read off an optimal policy for a new reward
function, or a policy that imitates a new demonstration. For this paper, we
focus on exact computation of the new representation in small, known
environments, since even this restricted setting offers plenty of interesting
questions. Our implementation does not scale to large, unknown environments --
nor would we expect it to, since it generalizes POMDP value iteration, which is
difficult to scale. However, we believe that future work will allow us to
extend our ideas to approximate reasoning in large, unknown environments
Policy-Gradient Training of Language Models for Ranking
Text retrieval plays a crucial role in incorporating factual knowledge for
decision making into language processing pipelines, ranging from chat-based web
search to question answering systems. Current state-of-the-art text retrieval
models leverage pre-trained large language models (LLMs) to achieve competitive
performance, but training LLM-based retrievers via typical contrastive losses
requires intricate heuristics, including selecting hard negatives and using
additional supervision as learning signals. This reliance on heuristics stems
from the fact that the contrastive loss itself is heuristic and does not
directly optimize the downstream metrics of decision quality at the end of the
processing pipeline. To address this issue, we introduce Neural PG-RANK, a
novel training algorithm that learns to rank by instantiating a LLM as a
Plackett-Luce ranking policy. Neural PG-RANK provides a principled method for
end-to-end training of retrieval models as part of larger decision systems via
policy gradient, with little reliance on complex heuristics, and it effectively
unifies the training objective with downstream decision-making quality. We
conduct extensive experiments on various text retrieval benchmarks. The results
demonstrate that when the training objective aligns with the evaluation setup,
Neural PG-RANK yields remarkable in-domain performance improvement, with
substantial out-of-domain generalization to some critical datasets employed in
downstream question answering tasks