Context: AI-assisted code generation tools have become increasingly prevalent
in software engineering, offering the ability to generate code from natural
language prompts or partial code inputs. Notable examples of these tools
include GitHub Copilot, Amazon CodeWhisperer, and OpenAI's ChatGPT.
Objective: This study aims to compare the performance of these prominent code
generation tools in terms of code quality metrics, such as Code Validity, Code
Correctness, Code Security, Code Reliability, and Code Maintainability, to
identify their strengths and shortcomings.
Method: We assess the code generation capabilities of GitHub Copilot, Amazon
CodeWhisperer, and ChatGPT using the benchmark HumanEval Dataset. The generated
code is then evaluated based on the proposed code quality metrics.
Results: Our analysis reveals that the latest versions of ChatGPT, GitHub
Copilot, and Amazon CodeWhisperer generate correct code 65.2%, 46.3%, and 31.1%
of the time, respectively. In comparison, the newer versions of GitHub CoPilot
and Amazon CodeWhisperer showed improvement rates of 18% for GitHub Copilot and
7% for Amazon CodeWhisperer. The average technical debt, considering code
smells, was found to be 8.9 minutes for ChatGPT, 9.1 minutes for GitHub
Copilot, and 5.6 minutes for Amazon CodeWhisperer.
Conclusions: This study highlights the strengths and weaknesses of some of
the most popular code generation tools, providing valuable insights for
practitioners. By comparing these generators, our results may assist
practitioners in selecting the optimal tool for specific tasks, enhancing their
decision-making process