7 research outputs found
Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models
Despite recent success in large language model (LLM) reasoning, LLMs struggle
with hierarchical multi-step reasoning tasks like generating complex programs.
For these tasks, humans often start with a high-level algorithmic design and
implement each part gradually. We introduce Parsel, a framework enabling
automatic implementation and validation of complex algorithms with code LLMs,
taking hierarchical function descriptions in natural language as input. We show
that Parsel can be used across domains requiring hierarchical reasoning,
including program synthesis, robotic planning, and theorem proving. We show
that LLMs generating Parsel solve more competition-level problems in the APPS
dataset, resulting in pass rates that are over 75% higher than prior results
from directly sampling AlphaCode and Codex, while often using a smaller sample
budget. We also find that LLM-generated robotic plans using Parsel as an
intermediate language are more than twice as likely to be considered accurate
than directly generated plans. Lastly, we explore how Parsel addresses LLM
limitations and discuss how Parsel may be useful for human programmers.Comment: new quantitative detail
Solving Math Word Problems by Combining Language Models With Symbolic Solvers
Automatically generating high-quality step-by-step solutions to math word
problems has many applications in education. Recently, combining large language
models (LLMs) with external tools to perform complex reasoning and calculation
has emerged as a promising direction for solving math word problems, but prior
approaches such as Program-Aided Language model (PAL) are biased towards simple
procedural problems and less effective for problems that require declarative
reasoning. We propose an approach that combines an LLM that can incrementally
formalize word problems as a set of variables and equations with an external
symbolic solver that can solve the equations. Our approach achieves comparable
accuracy to the original PAL on the GSM8K benchmark of math word problems and
outperforms PAL by an absolute 20% on ALGEBRA, a new dataset of more
challenging word problems extracted from Algebra textbooks. Our work highlights
the benefits of using declarative and incremental representations when
interfacing with an external tool for solving complex math word problems. Our
data and prompts are publicly available at
https://github.com/joyheyueya/declarative-math-word-problem
Hypothesis Search: Inductive Reasoning with Language Models
Inductive reasoning is a core problem-solving capacity: humans can identify
underlying principles from a few examples, which can then be robustly
generalized to novel scenarios. Recent work has evaluated large language models
(LLMs) on inductive reasoning tasks by directly prompting them yielding "in
context learning." This can work well for straightforward inductive tasks, but
performs very poorly on more complex tasks such as the Abstraction and
Reasoning Corpus (ARC). In this work, we propose to improve the inductive
reasoning ability of LLMs by generating explicit hypotheses at multiple levels
of abstraction: we prompt the LLM to propose multiple abstract hypotheses about
the problem, in natural language, then implement the natural language
hypotheses as concrete Python programs. These programs can be directly verified
by running on the observed examples and generalized to novel inputs. Because of
the prohibitive cost of generation with state-of-the-art LLMs, we consider a
middle step to filter the set of hypotheses that will be implemented into
programs: we either ask the LLM to summarize into a smaller set of hypotheses,
or ask human annotators to select a subset of the hypotheses. We verify our
pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its
variant 1D-ARC, and string transformation dataset SyGuS. On a random 40-problem
subset of ARC, our automated pipeline using LLM summaries achieves 27.5%
accuracy, significantly outperforming the direct prompting baseline (accuracy
of 12.5%). With the minimal human input of selecting from LLM-generated
candidates, the performance is boosted to 37.5%. (And we argue this is a lower
bound on the performance of our approach without filtering.) Our ablation
studies show that abstract hypothesis generation and concrete program
representations are both beneficial for LLMs to perform inductive reasoning
tasks
Pragmatic Code Autocomplete
Human language is ambiguous, with intended meanings recovered via pragmatic reasoning in context. Such reliance on context is essential for the efficiency of human communication. Programming languages, in stark contrast, are defined by unambiguous grammars. In this work, we aim to make programming languages more concise by allowing programmers to utilize a controlled level of ambiguity. Specifically, we allow single-character abbreviations for common keywords and identifiers. Our system first proposes a set of strings that can be abbreviated by the user. Using only 100 abbreviations, we observe that a large dataset of Python code can be compressed by 15%, a number that can be improved even further by specializing the abbreviations to a particular code base. We then use a contextualized sequence-to-sequence model to rank potential expansions of inputs that include abbreviations. In an offline reconstruction task our model achieves accuracies ranging from 93% to 99%, depending on the programming language and user settings. The model is small enough to run on a commodity CPU in real-time. We evaluate the usability of our system in a user study, integrating it in Microsoft VSCode, a popular code text editor. We observe that our system performs well and is complementary to traditional autocomplete features
Dynamic dispatch of context-sensitive optimizations
Exportado OPUSMade available in DSpace on 2019-08-13T22:30:16Z (GMT). No. of bitstreams: 1
gabrielpoesiareisesilva.pdf: 1292521 bytes, checksum: 4b4d25d8d1c1009b4dfdf0568aa34f0b (MD5)
Previous issue date: 4Construir análises sensíveis ao contexto escaláveis é um problema que vem sendo frequentemente trabalhadopela comunidade de compiladores, com sucesso. Porém, a implementação de otimizações sensíveis ao contextocontinua sendo desafiadora. O principal problema que desencoraja os compiladores de implementarem taisotimizações é o crescimento no tamanho do código. Com clonagem de funções ou inlining, duas técnicas conhecidaspara a implementação de especializações sensíveis ao contexto, o tamanho do código pode crescer exponencialmenteno pior caso. Ambas as técnicas são baseadas em criar cópias especializadas do código para cada contexto.Contudo, as duas técnicas precisam criar cópias de todas as funções no caminho de chamadas que leva a cadaotimização, ainda que isto envolva copiar funções que não serão otimizadas. Neste trabalho, propomos uma soluçãopara este problema. Utilizando uma combinação de despacho dinâmico e uma máquina de estados para controlar as transiçõesentre os contextos dinamicamente, nosso método implementa otimizações completamente sensíveis ao contextonecessitando apenas copiar as funções que serão otimizadas, mas não o caminho de chamadas até elas.Apresentamos nossa abordagem em Minilog, uma linguagem mínima que possui todos os recursos necessários para aplicaro método proposto, e provamos sua corretude. Implementamos nosso método na infraestrutura de compiladores LLVM,utilizando-o para otimizar programas com propagação de constantes completamente sensível a contexto.Nossos experimentos nos benchmarks do LLVM Test Suite e do SPEC CPU2006 mostram que nosso método escala significativamentemelhor em termos de espaço que clonagem de funções, gerando binários em média 2.7x menores, adicionando em média 8.5x menosbytes ao implementar a mesma otimização. Os binários gerados utilizando nossa técnica tiveram performance muito semelhante aosgerados com clonagem tradicional.Além disso, utilizando essa classe de otimizações ainda pouco explorada, conseguimos speed-ups de até 20\% em algunsbenchmarks quando comparados a LLVM -O3.The compilers community has dedicated much time and effort in making context-sensitive analyses scalable, with great profit. However, the implementation of context-sensitive optimizations remains a challenge. The main problem is code size growth. Both function cloning and inlining are based on creating copies of all functions in the call path that leads to each optimization, even when that involves copying functions that are not optimized. We propose a solution for that problem. Using a combination of dynamic dispatch and a state machine to dynamically control the transitions between calling contexts, our method implements fully context-sensitive optimizations only needing to copy optimized functions. Experiments in the LLVM Test Suite and SPEC CPU2006 show our method scales better than cloning, using 8.5x less bytes to implement the same optimizations. We have also observed speed-ups of up to 20% on top of LLVM -O3 using fully context-sensitive constant propagation
Recommended from our members
Left to the Reader: Abstracting Solutions in Mathematical Reasoning
Formal mathematical reasoning is unique in its precision: any valid conclusion can be justified by a sequence of base axioms. But human-written proofs or solutions rarely operate at that level. Instead, obvious steps are skipped to provide a simple, lucid argument. This is especially important in an educational setting, where too many details in an example solution, or too few, can confuse a student. What are the key steps for humans in a given formal solution? We investigate several computational hypotheses in the context of equation solving. Specifically, we take a reinforcement learning agent that solves equations using low-level axioms, and propose a series of methods for abstracting its solutions by selecting key steps. We consider methods based on the semantic distance between subsequent steps, based on the steps with the highest uncertainty for the agent, and based on transitions between latent "high-level skills" learned from a large number of agent-produced solutions. In a human evaluation we find that skill-base simplifications were judged most useful. These results suggest new directions for understanding human mathematical reasoning