Data Layout Recommendation for Big Data Systems via Large Language Models

So, Justin Chun Hei

thesis

oai:yorkspace.library.yorku.ca:10315/43319

Data Layout Recommendation for Big Data Systems via Large Language Models

Authors: Justin Chun Hei So
Publication date: 11 November 2025
Publisher

Abstract

The physical layout of data is critical to the performance of analytical queries, especially in column-store systems like IBM Db2. Among layout strategies, Z-ordering is a popular technique that maps multi-dimensional data to a one-dimensional space while preserving locality. However, tuning Z-order is challenging: users must manually select the columns to include, and most systems assign equal weight to each column, ignoring the varying impact of different columns on query performance. We present LayZ, an LLM-directed advisor for automated data layout tuning in IBM Db2. LayZ analyzes SQL workloads to extract query execution plan features and creates compact prompts that preserve layout-relevant information, thereby reducing inference cost when using large language models. LayZ generates ranked layout configurations, including weighted Z-orderings that adapt bit allocations based on workload characteristics. These configurations are evaluated using a cost model to identify the best candidate layout for the target workload. Our system supports both base tables and materialized views, enabling performance recovery in queries that regress under global physical design. Experimental results on the DSB workload show that LayZ outperforms heuristic and existing layout strategies, improving query performance by up to 90%

Similar works

Full text

YorkSpace

oai:yorkspace.library.yorku.ca...

Last time updated on 30/12/2025

This paper was published in YorkSpace.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.