OWASP (Open Worldwide Application Security Project) has created numerous security-related Top10 lists that classify the top risks for various areas of technology. While the most well-known standard is the OWASP Top10 for web applications, there are several other lists that deserve attention. These include the OWASP Top10 for CI/CD (https://owasp.org/www-project-top-10-ci-cd-security-risks/), which focuses on security risks associated with continuous integration and continuous deployment (CI/CD) processes. Additionally, the OWASP Top10 for API (https://owasp.org/www-project-api-security/) highlights the top vulnerabilities that can be found in application programming interfaces (APIs). Lastly, the OWASP Top10 for Mobile Apps (https://owasp.org/www-project-mobile-top-10/) addresses the specific security risks faced by mobile applications.
More recently, in the July of 2023, OWASP released an addition to their collection of Top10 lists. This new document focuses on vulnerabilities related to LLM Applications (Large Language Model Applications). (https://owasp.org/www-project-top-10-for-large-language-model-applications/ ). In this post, I will delve into the details of the Top10 LLM-related vulnerabilities, providing examples, observations, and commentary. So sit back, grab a cup of coffee, and enjoy this read 🙂
I tried to write this post in such a way, that it serves as a supplement to the original document – I tried building upon its content rather than duplicating it, but in some cases the vulnerabilities are so novel and niche, that everything I could have done was just recreating a description of vulnerability – probably we will see them in the wild soon.
LLM01: Prompt Injection
Prompt injection is the most characteristic attack related to Large Language Models. The result of successful prompt injection can be exposing sensitive information, tricking LLM into producing offensive content, using LLM out-of-scope (let’s say you have product-related informational chat and you’ll trick it into producing malware code) etc.
Prompt injection can be classified as one of two types of this attack:
Direct prompt injection has place, if an attacker has a direct access to LLM, and prompts it to produce a specific output
Indirect prompt injection which is a more advanced, but on the other hand less controllable approach, in which prompt injection payloads are delivered through third-party sources, such as websites which can be accessed by LLMs.
This vulnerability is nothing else, but a vector for Cross Site Scripting vulnerability (and similar vulnerabilities) caused by the LLM. Once the user prompts LLM with appropriate prompt, then LLM may break its own website, i.e. by rendering the XSS payload in the website context.
Example of Insecure Output Handling
OWASP does not provide any specific examples of XSS here, but recently I’ve found this kind of vulnerability in Chatsonic by Writesonic: https://writesonic.com/chat
In case of XSS attacks caused by the LLMs, we are working on the intersection of web application security and LLM security – first of all, we should sanitize the output from the model, and we should treat it as we usually treat the user controlled input.
If you automatically deploy code from LLM on your server, you should introduce some measures for verifying the security of the code.
LLM03: Training Data Poisoning
This vulnerability is older than the LLMs itself. It occurs, when AI model is learning on the data polluted with data, that should not be in the dataset: – fake news – incorrectly classified images – hate speech Etc.
One of the most notable examples (not related to LLMs) is poisoning the deep learning visual classification dataset with the road signs:
(source: Ruixiang Tang, Mengnan Du, Ninghao Liu, Fan Yang, Xia Hu. 2020. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discoveryand Data Mining (KDD ’20), August 23–27, 2020, https://doi.org/10.1145/3394486.3403064)
Examples of training data poisoning
Interesting example slightly this vulnerability may be this paper:
The curse of recursion: training on generated data makes models forget (https://arxiv.org/pdf/2305.17493v2.pdf by Shumailov et al.) in which authors demonstrate, that in the future learning LLM models on the data generated by another LLM models may lead to an effect called model collapse.
I am aware this risk is more “philosophical” at the current stage of LLM development, but it should be kept in mind, because in the era of LLM generated blog posts and articles, it may stop the development of LLMs.
In this case, data provided by the users made chatbot sexist, racist and anti-semitic.
How to mitigate training data poisoning?
OWASP Top10 for LLMs mentions techniques such as verifying supply chain integrity, verifying legitimacy of data sources, ensuring sufficient sandboxing to prevent models from scraping malicious data sources or using Reinforcement Learning techniques.
Data should be sanitized and access to the training data should be properly limited.
This vulnerability is well-known also from other OWASP Top10 lists. Attackers may cause unavailability of the model through running multiple queries that are complicated and require a lot of computational power.
The same measures, as used in APIs and web applications are used i.e. rate limiting of queries that the user can perform. Another approach that comes to my mind when thinking about that vulnerability is just detecting adversarial prompts that may lead to DoS, similar to the Prompt Injection case.
LLM05: Supply Chain
This vulnerability is a huge topic, supply chain related vulnerabilities are emerging both in AI and “regular” software development. In this case, set of vulnerabilities is extended by threats such as:
Applying transfer learning
Re-use of models
Re-use of data
Examples of supply chain issues in AI / LLM development
New vulnerabilities related to plugins may occur once other LLM companies introduce plugins in their solutions.
LLM07: Data leakage
Data leakage refers to all of the situations, in which LLM reveals sensitive information, proprietary algorithms, secrets, architecture details etc. It can also be applied to the situation in which a company that delivers LLM uses the data supplied by the users and lets the model learn on this data.
That was even done with GPT-2, so large language models are also vulnerable to this kind of attacks: extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. (source: https://arxiv.org/abs/2012.07805)
These are the vulnerabilities that may occur, when LLM gets direct access to another system. That may lead to some undesirable actions and results. Due to the current “hype” and lack of wide adoption of systems, in which LLMs directly interact with other systems, it’s hard to find examples. I will skip this vulnerability for now – maybe an update will come soon 🙂
Code generated by LLMs such as ChatGPT may also contain vulnerabilities – it’s up to developers to verify, if the code contains vulnerabilities. LLM output should never be trusted as vulnerability-free.
Examples of overreliance Overreliance takes place when i.e. model is hallucinating and the one is following it’s instructions blindly. Below are two examples of model hallucination.
You should always install plugins from a trusted source and avoid assigning wide permissions to the plugins that you use, so you can make an attack surface smaller.
It is important to remember that Large Language Models are still in their early stages of development and there are still many vulnerabilities that need to be addressed. Security specialists should stay up to date with the latest research and best practices to ensure security of Large Language Models. Follow this blog and my Twitter (https://twitter.com/m1k0ww) if you want to get more information about LLMs security.