LLM Security

OWASP Top 10 for Large Language Model Applications

LLM01: Prompt Injection

Prompt injection is the most characteristic attack related to Large Language Models. The result of successful prompt injection can be exposing sensitive information, tricking LLM into producing offensive content, using LLM out-of-scope (let’s say you have product-related informational chat and you’ll trick it into producing malware code) etc. 

Prompt injection can be classified as one of two types of this attack: 

  • Direct prompt injection has place, if an attacker has a direct access to LLM, and prompts it to produce a specific output 
  • Indirect prompt injection which is a more advanced, but on the other hand less controllable approach, in which prompt injection payloads are delivered through third-party sources, such as websites which can be accessed by LLMs.

Examples of prompt injection 

Direct prompt injection 

The simplest approach for prompt injection is for example: 

Reverse prompt engineering

How can we mitigate prompt injections? 

Some of the mitigations that are not included in the original document: 

– You can also use a simpler approach, such as separating user-controlled parts of input with special characters.

LLM02: Insecure Output Handling

If you automatically deploy code from LLM on your server, you should introduce some measures for verifying the security of the code.

LLM03: Training Data Poisoning

This vulnerability is older than the LLMs itself. It occurs, when AI model is learning on the data polluted with data, that should not be in the dataset:
– fake news
– incorrectly classified images
– hate speech

One of the most notable examples (not related to LLMs) is poisoning the deep learning visual classification dataset with the road signs:

Examples of training data poisoning

Interesting example slightly this vulnerability may be this paper: 

How to mitigate training data poisoning? 

LLM04: Denial of Service 

How to prevent LLM DoS 

The same measures, as used in APIs and web applications are used i.e. rate limiting of queries that the user can perform.
Another approach that comes to my mind when thinking about that vulnerability is just detecting adversarial prompts that may lead to DoS, similar to the Prompt Injection case.

LLM05: Supply Chain 

This vulnerability is a huge topic, supply chain related vulnerabilities are emerging both in AI and “regular” software development. In this case, set of vulnerabilities is extended by threats such as: 

  • Applying transfer learning
  • Re-use of models 
  • Re-use of data

Examples of supply chain issues in AI / LLM development

How to prevent supply chain attacks

LLM06: Permission Issues

When I saw the title of the vulnerability, my first thought was it refers to the classic Authn/Authz problem – meanwhile it turns out that it’s directly related to usage of Plugins in ChatGPT. 

Examples of data leakage

How to prevent data leakage in LLM? 

LLM08: Excessive Agency 

These are the vulnerabilities that may occur, when LLM gets direct access to another system. That may lead to some undesirable actions and results. Due to the current “hype” and lack of wide adoption of systems, in which LLMs directly interact with other systems, it’s hard to find examples. I will skip this vulnerability for now – maybe an update will come soon 🙂 

LLM09: Overreliance 

Code generated by LLMs such as ChatGPT may also contain vulnerabilities – it’s up to developers to verify, if the code contains vulnerabilities. LLM output should never be trusted as vulnerability-free. 

Examples of overreliance

Overreliance takes place when i.e. model is hallucinating and the one is following it’s instructions blindly. Below are two examples of model hallucination. 

LLM10: Insecure Plugins 

All of the occurrences of misconfigured/poorly developed LLM plugins, that lead to undesired behaviors.

Examples of insecure plugins 

How to deal with insecure plugins? 

You should always install plugins from a trusted source and avoid assigning wide permissions to the plugins that you use, so you can make an attack surface smaller.