Categories
LLM Security MLOps

Real Threats of Artificial Intelligence – AI Security Newsletter #9

Hello everyone!
It’s been a while, and although I’ve been keeping up with what’s happening in the AI world, I haven’t really had time to post new releases. I’ve also decided to change a form, and for some time I’ll be doing just the links instead of links + summaries. Let me know how you like the new form. I think it’s more useful, because in most cases you get the summary of the article from the beginning. Since this is a “resurrection” of this newsletter, I’ve tried to include some of the most important news from the last 5 months in AI security here. Also, I’ve started using the tool that detects if the LLM was used to create the content – this way I’m trying to filter out low quality content created with LLMs (I mean, if the content is created with ChatGPT, you could create it yourself, right?).


If you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance! What is more, if you are a blogger, researcher or founder in the area of AI Security/AI Safety/MLSecOps etc. feel free to send me your work and I will include it in this newsletter 🙂 

LLM Security 

AI Security 

AI Safety

Categories
LLM Security

Indirect prompt injection with YouTube video

In this short blog post I will show how I have found a way to “attack” Large Language Model with the YouTube video – this attack is called “indirect prompt injection”.

Recently I’ve found LeMUR by AssemblyAI – someone posted it on Twitter and I’ve decided that it may be an interesting target to test for Prompt Injections. 

When talking about prompt injections, we distinguish two types – first type is direct prompt injection, in which PI payload is placed in the application by the attacker and the second type is indirect prompt injection, in which the PI payload is carried using third party medium – image, content of the website that is scrapped by the model or audio file. 

First of all, I’ve started with generic Prompt Injection that is known from “traditional” LLMs – I just told the model to ignore all of the previous instructions and follow my instruction: 

After it turned out that the model follows my instructions, I’ve decided that it would be interesting to check if it will follow instructions directly from the video. I’ve recorded a test video with Prompt Injection payloads: 

Unfortunately, I still have had to send instructions explicitly in the form that I’ve controlled: 

When I’ve numbered the paragraphs, it turned out that I am able to control the processing of the transcript from the video/transcript level (in this case, the paragraph 4 redirected to paragraph 2 with the prompt injection payload in it, what caused the model to reply simply with “Lol”): 

That was the vid: 

I tricked the Summary feature to say what I wanted with the same vid: 

Instead of summarizing the text, the model just says “Lol”. This specific bug may be used by individuals that don’t want their content to be processed by the automated LLM-based solutions – I don’t judge if it’s a bug, or a feature, neither do I say that LeMUR is insecure (because it’s rather secure) – I just wanted to showcase this interesting case of indirect prompt injection.

If you want to know more about LLM and AI security, subscribe my newsletter: https://hackstery.com/newsletter/