The study explores the capabilities of OpenAI's ChatGPT in solving different
types of physics problems. ChatGPT (with GPT-4) was queried to solve a total of
40 problems from a college-level engineering physics course. These problems
ranged from well-specified problems, where all data required for solving the
problem was provided, to under-specified, real-world problems where not all
necessary data were given. Our findings show that ChatGPT could successfully
solve 62.5% of the well-specified problems, but its accuracy drops to 8.3% for
under-specified problems. Analysis of the model's incorrect solutions revealed
three distinct failure modes: 1) failure to construct accurate models of the
physical world, 2) failure to make reasonable assumptions about missing data,
and 3) calculation errors. The study offers implications for how to leverage
LLM-augmented instructional materials to enhance STEM education. The insights
also contribute to the broader discourse on AI's strengths and limitations,
serving both educators aiming to leverage the technology and researchers
investigating human-AI collaboration frameworks for problem-solving and
decision-making.Comment: 12 pages, 2 figure