Welcome to the 8th release of The Real Threats of Artificial Intelligence.
It’s been more than a month since the last edition of this newsletter. I’ve had some things going on – including talks at OWASP Oslo Chapter and at Nordic AI Summit (you can find the slides here: https://hackstery.com/talks-and-slides/), so I haven’t really had spare time to spend on digging for resources for the newsletter. But I am back on track and hopefully upcoming releases will show up more regularly. Here are some articles on AI security that I’ve found in my “information bubble”. Also, in the beginning of January I’ll publish some more interesting things on MLOps leaking secrets!
If you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance! What is more, if you are a blogger, researcher or founder in the area of AI Security/AI Safety/MLSecOps etc. feel free to send me your work and I will include it in this newsletter 🙂
LLM Security
Johann Rehberger’s talk on Prompt Injections at Ekoparty ‘23
Link: https://embracethered.com/blog/posts/2023/ekoparty-prompt-injection-talk/
Hacking Google Bard – From Prompt Injection to Data Exfiltration
Indirect prompt injections in Google Bard via Google Docs or Gmail.
Link: https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/
Prompt Injection Benchmark by Layier.AI
Layier.AI benchmarked Prompt Injection detection tools – incl. LLMGuard, Lakera Guard or RebuffAI.
Link: https://huggingface.co/spaces/laiyer/prompt-injection-benchmark + article: https://laiyer.substack.com/p/how-do-prompt-injection-scanners
Fine-tuned version of DebertaV3 model by LaiyerAI
This model aims to identify Prompt Injections and it got more than 600 thousand downloads at this point.
Link: https://huggingface.co/laiyer/deberta-v3-base-prompt-injection
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
Jailbreaking large language models through nested prompts.
Link: https://arxiv.org/pdf/2311.03191.pdf
Meta’s new tools for LLM security
Meta released new tools (Llama Guard and Purple Llama) for safeguarding input and output in communication with Large Language Models and proposed a benchmark for evaluating the cybersecurity risks in the models.
Links:
https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai/
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks
Data poisoning attacks during the fine tuning of the models.
Link: https://arxiv.org/pdf/2312.04748.pdf
AI Security
ProtectAI AI Exploits
A collection of real world AI/ML exploits for responsibly disclosed vulnerabilities by ProtectAI
Link: https://github.com/protectai/ai-exploits
Huntr tutorial for ML bug hunters
If you’ve ever wondered how you can start looking for the vulnerabilities in MLOps/ML tools, Huntr (bug bounty program for ML) has you covered.
Link: https://huntr.com/get-started/intro/
Assessing the security posture of a widely used vision model: YOLOv7
Trailofbits reports a bunch of vulnerabilities in YOLOv7, a computer vision framework. Following vulnerabilities were found: remote code execution (RCE), denial of service, and model differentials (where an attacker can trigger a model to perform differently in different contexts).
Google’s framework for ML supply chain security
In this framework, Google introduced code for model signing and Supply Chain Levels for Software Artifacts (SLSA)
Link: https://github.com/google/model-transparency
AI/LLM as a tool for cybersecurity
Cisco AI Assistant for Cybersecurity
Cisco released its new gen AI focused on supporting cybersecurity operations.
Will cybersecurity engineers be replaced by AI?
Guess.
Link: https://blog.edned.net/will-ai-replace-cyber-security/
AI safety
Meta broken up its Responsible AI team
This link has been in my notes since November… Meta broke up its Responsible AI team. But, as you’ve seen in the “LLM Security” section, they are still working on Responsible AI.
Link: https://www.spiceworks.com/tech/artificial-intelligence/news/metas-dissolution-responsible-ai/
Jobs
- AI/ML Penetration Tester at NetSPI (US)
- Senior ML Engineer at Snowflake (Poland)
- Senior Security Engineer (AI/ML) at Apple (US)
- Offensive Security Engineer at AI Red Team at Microsoft (US)
- Principal ML Security Engineer at ProtectAI (US)
- Principal AI/ML Security Specialist at Sage (UK)
Other AI-related things
- https://arxiv.org/pdf/2307.11760.pdf – tell the model you are stressed or under pressure to improve performance
- https://www.bloomberg.com/opinion/articles/2023-11-20/who-controls-openai – who controls OpenAI