Here comes another edition of my newsletter. I’ve collected some interesting resources on AI and LLM security – most of them published in the last two weeks of September.
If you are not a subscriber yet, feel invited to subscribe here.
Also, if you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance!
Autumn-themed thumbnail generated with Bing Image Creator 🙂
OpenAI launches Red Teaming Network
OpenAI announced an open call for OpenAI Red Teaming Network. In this interdisciplinary initiative, they want to improve the security of their models. Not only do they invite red teaming experts with backgrounds in cybersecurity, but also experts from other domains, with a variety of cultural backgrounds and languages.
I am building a payloads’ set for LLM security testing
Shameless auto-promotion, but I’ve started working on PALLMS (Payloads for Attacking Large Language Models) project, within which I want to build huge base of payload, which can be utilized while attacking LLMs. There’s no such an initiative publicly available on the Internet, so that’s a pretty fresh project. Contributors welcome!
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins
In this paper (by Iqbal, et. al.) authors review the security of ChatGPT plugins. That’s a great supplement for OWASP Top10 for LLM LLM:07 – Insecure Plugin Design vulnerability. Not only have authors analyzed the attack surface, but also they demonstrated potential risks on real-life examples. In this paper, you will find an analysis of threats such as: hijacking user machine, plugin squatting, history sniffing, LLM session hijacking, plugin response hallucination, functionality squatting, topic squatting and many more. The topic is interesting and I recommend this paper!
Wunderwuzzi – Advanced Data Exfiltration Techniques with ChatGPT
In this blog post, awesome @wunderwuzzi presents a variety of techniques for ChatGPT chat history data exfiltration by combining techniques such as indirect prompt injection and using plugins in a malicious way.
Security Weaknesses of Copilot Generated Code in GitHub
In this paper, Fu, et. al. analyze security of the code generated using GH copilot. I will just paste a few sentences from the article’s summary:
“Our results show: (1) 35.8% of the 435 Copilot generated code snippets contain security weaknesses, spreading across six programming languages. (2) The detected security weaknesses are diverse in nature and are associated with 42 different CWEs. The CWEs that occurred most frequently are CWE-78: OS Command Injection, CWE-330: Use of Insufficiently Random Values, and CWE-703: Improper Check or Handling of Exceptional Conditions (3) Among these CWEs, 11 appear in the MITRE CWE Top-25 list(…)”
Review your code – either from Copilot or from ChatGPT!
Jailbreaker in Jail: Moving Target Defense for Large Language Models
In this paper, authors demonstrate how Moving Target Defense (MTD) technique enabled them to protect LLMS against adversarial prompts.
Can LLMs be instructed to protect personal information?
In this paper, the authors announced PrivQA – “a multimodal benchmark to assess this privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario.”
Bing Chat responses infiltrated by ads pushing malware
As Bing Chat is scraping the web, malicious ads have been detected to be actively injected into its responses. Kind of reminds me of an issue I’ve found in Chatsonic in May ’23.
Image-based prompt injection in Bing Chat AI
NSA is creating a hub for AI Security
The American National Security Agency has just launched a hub for AI security – The AI Security Center. One of the goals is to create the risk frameworks for AI security. Paul Nakasone, the director of the NSA, proposes an elegant definition of AI security:
“Protecting systems from learning, doing and revealing the wrong thing”.
Study on the robustness of AI-Image detection
In this paper, researchers have proven that the detectors of AI-generated images have multiple vulnerabilities and there isn’t a good way for proving if the image is real or generated by the AI. “Our attacks are able to break every existing watermark that we have encountered” – said the researchers.
ShellTorch (critical vulnerability!)
A critical vulnerability has been found in TorchServe – PyTorch model server. This vulnerability allows access to proprietary AI models, insertion of malicious models, and leakage of sensitive data – and can be used to alter the model’s results or to execute a full server takeover.
Here’s a visual explanation of this vulnerability from BleepingComputer:
AI/LLM as a tool for cybersecurity
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions
In this paper, the conclusion is that LLMs are not the best tool to provide S&P advice, but for some reason, the researchers (Chen, Arunasalam, Celik) haven’t tried to either fine-tune the model using fine-tuning APIs, or to use embeddings – thus, I believe the question remains kind of open. In my opinion, if you fine-tune the model on your knowledge base or if you create some kind of embedding of your data, then the quality of S&P advice should go up.
Map of AI regulations all over the world
Fairly AI team have done this super cool work and published a map of AI regulations all over the world. Useful for anyone working with a legal side of AI!
The map legend:
- Green: Regulation that’s passed and now active.
- Blue: Passed, but not live yet.
- Yellow: Currently proposed regulations.
- Red: Regions just starting to talk about it, laying down some early thoughts.
Some thoughts on why AI shouldn’t be regulated, but rather decentralized
Canada aims to be the first country in the world with official regulations covering the AI sector
Other AI-related things
Build an end-to-end MLOps pipeline for visual quality inspection at the edge
In this 3-part series, AWS team demonstrates how to build MLOps pipelines:
If you want more papers and articles
- ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP, Yan, et. al., link: https://arxiv.org/pdf/2308.02122.pdf
- When to Trust AI: Advances and Challenges for Certification of Neural Networks, Kwiatkowska, Zhang, link: https://arxiv.org/pdf/2309.11196.pdf
- How well does LLM generate security tests?, Zhang, et. al. link: https://arxiv.org/pdf/2310.00710.pdf
- Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT, Begoulink, et. al., link: https://arxiv.org/pdf/2309.10463.pdf