1 research outputs found
Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation
Large Language Models (LLMs) have been demonstrated effective for code
generation. Due to the complexity and opacity of LLMs, little is known about
how these models generate code. To deepen our understanding, we investigate
whether LLMs attend to the same parts of a natural language description as
human programmers during code generation. An analysis of five LLMs on a popular
benchmark, HumanEval, revealed a consistent misalignment between LLMs' and
programmers' attention. Furthermore, we found that there is no correlation
between the code generation accuracy of LLMs and their alignment with human
programmers. Through a quantitative experiment and a user study, we confirmed
that, among twelve different attention computation methods, attention computed
by the perturbation-based method is most aligned with human attention and is
constantly favored by human programmers. Our findings highlight the need for
human-aligned LLMs for better interpretability and programmer trust.Comment: 13 pages, 8 figures, 7 table