Existing approaches to automatic data transformation are insufficient to meet
the requirements in many real-world scenarios, such as the building sector.
First, there is no convenient interface for domain experts to provide domain
knowledge easily. Second, they require significant training data collection
overheads. Third, the accuracy suffers from complicated schema changes. To
bridge this gap, we present a novel approach that leverages the unique
capabilities of large language models (LLMs) in coding, complex reasoning, and
zero-shot learning to generate SQL code that transforms the source datasets
into the target datasets. We demonstrate the viability of this approach by
designing an LLM-based framework, termed SQLMorpher, which comprises a prompt
generator that integrates the initial prompt with optional domain knowledge and
historical patterns in external databases. It also implements an iterative
prompt optimization mechanism that automatically improves the prompt based on
flaw detection. The key contributions of this work include (1) pioneering an
end-to-end LLM-based solution for data transformation, (2) developing a
benchmark dataset of 105 real-world building energy data transformation
problems, and (3) conducting an extensive empirical evaluation where our
approach achieved 96% accuracy in all 105 problems. SQLMorpher demonstrates the
effectiveness of utilizing LLMs in complex, domain-specific challenges,
highlighting the potential of their potential to drive sustainable solutions.Comment: 10 pages, 7 figure