In recent years, Large Language Models (LLMs) have demonstrated remarkable
potential across various downstream tasks. LLM-integrated frameworks, which
serve as the essential infrastructure, have given rise to many LLM-integrated
web apps. However, some of these frameworks suffer from Remote Code Execution
(RCE) vulnerabilities, allowing attackers to execute arbitrary code on apps'
servers remotely via prompt injections. Despite the severity of these
vulnerabilities, no existing work has been conducted for a systematic
investigation of them. This leaves a great challenge on how to detect
vulnerabilities in frameworks as well as LLM-integrated apps in real-world
scenarios.
To fill this gap, we present two novel strategies, including 1) a static
analysis-based tool called LLMSmith to scan the source code of the framework to
detect potential RCE vulnerabilities and 2) a prompt-based automated testing
approach to verify the vulnerability in LLM-integrated web apps. We discovered
13 vulnerabilities in 6 frameworks, including 12 RCE vulnerabilities and 1
arbitrary file read/write vulnerability. 11 of them are confirmed by the
framework developers, resulting in the assignment of 7 CVE IDs. After testing
51 apps, we found vulnerabilities in 17 apps, 16 of which are vulnerable to RCE
and 1 to SQL injection. We responsibly reported all 17 issues to the
corresponding developers and received acknowledgments. Furthermore, we amplify
the attack impact beyond achieving RCE by allowing attackers to exploit other
app users (e.g. app responses hijacking, user API key leakage) without direct
interaction between the attacker and the victim. Lastly, we propose some
mitigating strategies for improving the security awareness of both framework
and app developers, helping them to mitigate these risks effectively