The advents of Large Language Models (LLMs) have shown promise in augmenting
programming using natural interactions. However, while LLMs are proficient in
compiling common usage patterns into a programming language, e.g., Python, it
remains a challenge how to edit and debug an LLM-generated program. We
introduce ANPL, a programming system that allows users to decompose
user-specific tasks. In an ANPL program, a user can directly manipulate sketch,
which specifies the data flow of the generated program. The user annotates the
modules, or hole with natural language descriptions offloading the expensive
task of generating functionalities to the LLM. Given an ANPL program, the ANPL
compiler generates a cohesive Python program that implements the
functionalities in hole, while respecting the dataflows specified in sketch. We
deploy ANPL on the Abstraction and Reasoning Corpus (ARC), a set of unique
tasks that are challenging for state-of-the-art AI systems, showing it
outperforms baseline programming systems that (a) without the ability to
decompose tasks interactively and (b) without the guarantee that the modules
can be correctly composed together. We obtain a dataset consisting of 300/400
ARC tasks that were successfully decomposed and grounded in Python, providing
valuable insights into how humans decompose programmatic tasks. See the dataset
at https://iprc-dip.github.io/DARC