95 research outputs found
CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning
Large language models (LLMs) have achieved remarkable success across a wide
spectrum of tasks; however, they still face limitations in scenarios that
demand long-term planning and spatial reasoning. To facilitate this line of
research, in this work, we propose a new benchmark, termed ath
lanning from atural anguage
(). Our benchmark evaluates LLMs' spatial-temporal reasoning by
formulating ''path planning'' tasks that require an LLM to navigate to target
locations while avoiding obstacles and adhering to constraints. Leveraging this
benchmark, we systematically investigate LLMs including GPT-4 via different
few-shot prompting methodologies as well as BART and T5 of various sizes via
fine-tuning. Our experimental results show the promise of few-shot GPT-4 in
spatial reasoning, when it is prompted to reason and act interleavedly,
although it still fails to perform long-term temporal reasoning. In contrast,
while fine-tuned LLMs achieved impressive results on in-distribution reasoning
tasks, they struggled to generalize to larger environments or environments with
more obstacles
Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques
Compositional and domain generalization present significant challenges in
semantic parsing, even for state-of-the-art semantic parsers based on
pre-trained language models (LMs). In this study, we empirically investigate
improving an LM's generalization in semantic parsing with two simple
techniques: at the token level, we introduce a token preprocessing method to
preserve the semantic boundaries of tokens produced by LM tokenizers; at the
sequence level, we propose to use special tokens to mark the boundaries of
components aligned between input and output. Our experimental results on two
text-to-SQL semantic parsing datasets show that our token preprocessing,
although simple, can substantially improve the LM performance on both types of
generalization, and our component boundary marking method is particularly
helpful for compositional generalization.Comment: 9 pages, to be published in ACL202
Instance Needs More Care: Rewriting Prompts for Instances Yields Better Zero-Shot Performance
Enabling large language models (LLMs) to perform tasks in zero-shot has been
an appealing goal owing to its labor-saving (i.e., requiring no task-specific
annotations); as such, zero-shot prompting approaches also enjoy better task
generalizability. To improve LLMs' zero-shot performance, prior work has
focused on devising more effective task instructions (e.g., ``let's think step
by step'' ). However, we argue that, in order for an LLM to solve them
correctly in zero-shot, individual test instances need more carefully designed
and customized instructions. To this end, we propose PRoMPTd, an approach that
rewrites the task prompt for each individual test input to be more specific,
unambiguous, and complete, so as to provide better guidance to the task LLM. We
evaluated PRoMPTd on eight datasets covering tasks including arithmetics,
logical reasoning, and code generation, using GPT-4 as the task LLM. Notably,
PRoMPTd achieves an absolute improvement of around 10% on the complex MATH
dataset and 5% on the code generation task on HumanEval, outperforming
conventional zero-shot methods. In addition, we also showed that the rewritten
prompt can provide better interpretability of how the LLM resolves each test
instance, which can potentially be leveraged as a defense mechanism against
adversarial prompting. The source code and dataset can be obtained from
https://github.com/salokr/PRoMPTdComment: Work in progres
- …