Welcome to the 8th release of The Real Threats of Artificial Intelligence.

It’s been more than a month since the last edition of this newsletter. I’ve had some things going on – including talks at OWASP Oslo Chapter and at Nordic AI Summit (you can find the slides here: https://hackstery.com/talks-and-slides/), so I haven’t really had spare time to spend on digging for resources for the newsletter. But I am back on track and hopefully upcoming releases will show up more regularly. Here are some articles on AI security that I’ve found in my “information bubble”. Also, in the beginning of January I’ll publish some more interesting things on MLOps leaking secrets!

If you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance! What is more, if you are a blogger, researcher or founder in the area of AI Security/AI Safety/MLSecOps etc. feel free to send me your work and I will include it in this newsletter 🙂

LLM Security

Johann Rehberger’s talk on Prompt Injections at Ekoparty ‘23

Link: https://embracethered.com/blog/posts/2023/ekoparty-prompt-injection-talk/

Hacking Google Bard – From Prompt Injection to Data Exfiltration

Indirect prompt injections in Google Bard via Google Docs or Gmail.

Link: https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/

Prompt Injection Benchmark by Layier.AI

Layier.AI benchmarked Prompt Injection detection tools – incl. LLMGuard, Lakera Guard or RebuffAI.

Link: https://huggingface.co/spaces/laiyer/prompt-injection-benchmark + article: https://laiyer.substack.com/p/how-do-prompt-injection-scanners

Fine-tuned version of DebertaV3 model by LaiyerAI

This model aims to identify Prompt Injections and it got more than 600 thousand downloads at this point.

Link: https://huggingface.co/laiyer/deberta-v3-base-prompt-injection

DeepInception: Hypnotize Large Language Model to Be Jailbreaker

Jailbreaking large language models through nested prompts.

Link: https://arxiv.org/pdf/2311.03191.pdf

Meta’s new tools for LLM security

Meta released new tools (Llama Guard and Purple Llama) for safeguarding input and output in communication with Large Language Models and proposed a benchmark for evaluating the cybersecurity risks in the models.

Links:

https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai/

https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/

https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Data poisoning attacks during the fine tuning of the models.

Link: https://arxiv.org/pdf/2312.04748.pdf

AI Security

ProtectAI AI Exploits

A collection of real world AI/ML exploits for responsibly disclosed vulnerabilities by ProtectAI

Link: https://github.com/protectai/ai-exploits

Huntr tutorial for ML bug hunters

If you’ve ever wondered how you can start looking for the vulnerabilities in MLOps/ML tools, Huntr (bug bounty program for ML) has you covered.

Link: https://huntr.com/get-started/intro/

Assessing the security posture of a widely used vision model: YOLOv7

Trailofbits reports a bunch of vulnerabilities in YOLOv7, a computer vision framework. Following vulnerabilities were found: remote code execution (RCE), denial of service, and model differentials (where an attacker can trigger a model to perform differently in different contexts).

Link: https://blog.trailofbits.com/2023/11/15/assessing-the-security-posture-of-a-widely-used-vision-model-yolov7/

Google’s framework for ML supply chain security

In this framework, Google introduced code for model signing and Supply Chain Levels for Software Artifacts (SLSA)

Link: https://github.com/google/model-transparency

AI/LLM as a tool for cybersecurity

Cisco AI Assistant for Cybersecurity

Cisco released its new gen AI focused on supporting cybersecurity operations.

Link: https://www.uctoday.com/unified-communications/cpaas/introducing-the-cisco-ai-assistant-for-security/

Will cybersecurity engineers be replaced by AI?

Guess.

Link: https://blog.edned.net/will-ai-replace-cyber-security/

AI safety

Meta broken up its Responsible AI team

This link has been in my notes since November… Meta broke up its Responsible AI team. But, as you’ve seen in the “LLM Security” section, they are still working on Responsible AI.

Link: https://www.spiceworks.com/tech/artificial-intelligence/news/metas-dissolution-responsible-ai/

Jobs

AI/ML Penetration Tester at NetSPI (US)
Senior ML Engineer at Snowflake (Poland)
Senior Security Engineer (AI/ML) at Apple (US)
Offensive Security Engineer at AI Red Team at Microsoft (US)
Principal ML Security Engineer at ProtectAI (US)
Principal AI/ML Security Specialist at Sage (UK)

Other AI-related things

https://arxiv.org/pdf/2307.11760.pdf – tell the model you are stressed or under pressure to improve performance

https://www.bloomberg.com/opinion/articles/2023-11-20/who-controls-openai – who controls OpenAI

If you want more papers and articles

Is the Reversal Curse Real?