7 research outputs found

    Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models

    Full text link
    Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, taking hierarchical function descriptions in natural language as input. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis, robotic planning, and theorem proving. We show that LLMs generating Parsel solve more competition-level problems in the APPS dataset, resulting in pass rates that are over 75% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. We also find that LLM-generated robotic plans using Parsel as an intermediate language are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers.Comment: new quantitative detail

    Solving Math Word Problems by Combining Language Models With Symbolic Solvers

    Full text link
    Automatically generating high-quality step-by-step solutions to math word problems has many applications in education. Recently, combining large language models (LLMs) with external tools to perform complex reasoning and calculation has emerged as a promising direction for solving math word problems, but prior approaches such as Program-Aided Language model (PAL) are biased towards simple procedural problems and less effective for problems that require declarative reasoning. We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver that can solve the equations. Our approach achieves comparable accuracy to the original PAL on the GSM8K benchmark of math word problems and outperforms PAL by an absolute 20% on ALGEBRA, a new dataset of more challenging word problems extracted from Algebra textbooks. Our work highlights the benefits of using declarative and incremental representations when interfacing with an external tool for solving complex math word problems. Our data and prompts are publicly available at https://github.com/joyheyueya/declarative-math-word-problem

    Hypothesis Search: Inductive Reasoning with Language Models

    Full text link
    Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very poorly on more complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be directly verified by running on the observed examples and generalized to novel inputs. Because of the prohibitive cost of generation with state-of-the-art LLMs, we consider a middle step to filter the set of hypotheses that will be implemented into programs: we either ask the LLM to summarize into a smaller set of hypotheses, or ask human annotators to select a subset of the hypotheses. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, and string transformation dataset SyGuS. On a random 40-problem subset of ARC, our automated pipeline using LLM summaries achieves 27.5% accuracy, significantly outperforming the direct prompting baseline (accuracy of 12.5%). With the minimal human input of selecting from LLM-generated candidates, the performance is boosted to 37.5%. (And we argue this is a lower bound on the performance of our approach without filtering.) Our ablation studies show that abstract hypothesis generation and concrete program representations are both beneficial for LLMs to perform inductive reasoning tasks

    Pragmatic Code Autocomplete

    Full text link
    Human language is ambiguous, with intended meanings recovered via pragmatic reasoning in context. Such reliance on context is essential for the efficiency of human communication. Programming languages, in stark contrast, are defined by unambiguous grammars. In this work, we aim to make programming languages more concise by allowing programmers to utilize a controlled level of ambiguity. Specifically, we allow single-character abbreviations for common keywords and identifiers. Our system first proposes a set of strings that can be abbreviated by the user. Using only 100 abbreviations, we observe that a large dataset of Python code can be compressed by 15%, a number that can be improved even further by specializing the abbreviations to a particular code base. We then use a contextualized sequence-to-sequence model to rank potential expansions of inputs that include abbreviations. In an offline reconstruction task our model achieves accuracies ranging from 93% to 99%, depending on the programming language and user settings. The model is small enough to run on a commodity CPU in real-time. We evaluate the usability of our system in a user study, integrating it in Microsoft VSCode, a popular code text editor. We observe that our system performs well and is complementary to traditional autocomplete features

    Dynamic dispatch of context-sensitive optimizations

    Full text link
    Exportado OPUSMade available in DSpace on 2019-08-13T22:30:16Z (GMT). No. of bitstreams: 1 gabrielpoesiareisesilva.pdf: 1292521 bytes, checksum: 4b4d25d8d1c1009b4dfdf0568aa34f0b (MD5) Previous issue date: 4Construir análises sensíveis ao contexto escaláveis é um problema que vem sendo frequentemente trabalhadopela comunidade de compiladores, com sucesso. Porém, a implementação de otimizações sensíveis ao contextocontinua sendo desafiadora. O principal problema que desencoraja os compiladores de implementarem taisotimizações é o crescimento no tamanho do código. Com clonagem de funções ou inlining, duas técnicas conhecidaspara a implementação de especializações sensíveis ao contexto, o tamanho do código pode crescer exponencialmenteno pior caso. Ambas as técnicas são baseadas em criar cópias especializadas do código para cada contexto.Contudo, as duas técnicas precisam criar cópias de todas as funções no caminho de chamadas que leva a cadaotimização, ainda que isto envolva copiar funções que não serão otimizadas. Neste trabalho, propomos uma soluçãopara este problema. Utilizando uma combinação de despacho dinâmico e uma máquina de estados para controlar as transiçõesentre os contextos dinamicamente, nosso método implementa otimizações completamente sensíveis ao contextonecessitando apenas copiar as funções que serão otimizadas, mas não o caminho de chamadas até elas.Apresentamos nossa abordagem em Minilog, uma linguagem mínima que possui todos os recursos necessários para aplicaro método proposto, e provamos sua corretude. Implementamos nosso método na infraestrutura de compiladores LLVM,utilizando-o para otimizar programas com propagação de constantes completamente sensível a contexto.Nossos experimentos nos benchmarks do LLVM Test Suite e do SPEC CPU2006 mostram que nosso método escala significativamentemelhor em termos de espaço que clonagem de funções, gerando binários em média 2.7x menores, adicionando em média 8.5x menosbytes ao implementar a mesma otimização. Os binários gerados utilizando nossa técnica tiveram performance muito semelhante aosgerados com clonagem tradicional.Além disso, utilizando essa classe de otimizações ainda pouco explorada, conseguimos speed-ups de até 20\% em algunsbenchmarks quando comparados a LLVM -O3.The compilers community has dedicated much time and effort in making context-sensitive analyses scalable, with great profit. However, the implementation of context-sensitive optimizations remains a challenge. The main problem is code size growth. Both function cloning and inlining are based on creating copies of all functions in the call path that leads to each optimization, even when that involves copying functions that are not optimized. We propose a solution for that problem. Using a combination of dynamic dispatch and a state machine to dynamically control the transitions between calling contexts, our method implements fully context-sensitive optimizations only needing to copy optimized functions. Experiments in the LLVM Test Suite and SPEC CPU2006 show our method scales better than cloning, using 8.5x less bytes to implement the same optimizations. We have also observed speed-ups of up to 20% on top of LLVM -O3 using fully context-sensitive constant propagation
    corecore