Here comes the fourth release of my newsletter. This time I have included a lot of content related to the DEFCON AI Village (I have tagged content that comes from there) – a bit late, but better later than never. Anyway, enjoy reading.

Also, if you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance!

Any feedback on this newsletter is welcome – you can mail me or post a comment in this article.

AI Security

Model Confusion – Weaponizing ML models for red teams and bounty hunters [AI Village]

This is an excellent read about ML supply chain security by Adrian Wood. One of the most insightful resources on the ML supply chain that I’ve seen. Totally worth reading!

Link: https://5stars217.github.io/2023-08-08-red-teaming-with-ml-models/

Assessing the Vulnerabilities of the Open-Source Artificial Intelligence (AI) Landscape: A Large-Scale Analysis of the Hugging Face Platform [AI Village]

Researchers have performed automated analysis of 110 000 models from Hugging Face and have found almost 6 million vulnerabilities in the code.

Links: slides: https://aivillage.org/assets/AIVDC31/DSAIL%20DEFCON%20AI%20Village.pdf paper: https://www.researchgate.net/publication/372761501_Assessing_the_Vulnerabilities_of_the_Open-Source_Artificial_Intelligence_AI_Landscape_A_Large-Scale_Analysis_of_the_Hugging_Face_Platform

Podcast on MLSecOps [60 min]

Ian Swanson (CEO of Protect AI) & Emilio Escobar (CISO of Datadog) are talking about ML & AI Security, MLSecOps, Supply Chain Security and LLMs:

Link: https://shomik.substack.com/p/17-ian-swanson-ceo-of-protect-ai

Regulations

LLM Legal Risk Management, and Use Case Development Strategies to Minimize Risk [AI Village]

Well, I am not a lawyer. But I do know a few lawyers who read this newsletter, so maybe you will find these slides on the legal aspects of LLM risk management interesting 🙂

Link: https://aivillage.org/assets/AIVDC31/Defcon%20Presentation_2.pdf

Canadian Guardrails for Generative AI

Canadians have created a document with a set of guardrails for developers and operators of Generative AI systems.

Link: https://ised-isde.canada.ca/site/ised/en/consultation-development-canadian-code-practice-generative-artificial-intelligence-systems/canadian-guardrails-generative-ai-code-practice

LLM Security

LLMSecurity.net – A Database of LLM-security Related Resources

Website by Leon Derczynski (LI: https://www.linkedin.com/in/leon-derczynski/ ) that catalogs various papers, articles and news regarding

Link: https://llmsecurity.net/

LLMs Hacker’s Handbook

This thing was on the Internet for a while, but for some reason I’ve never seen it. LLM Hacker’s Handbook with some useful techniques of Prompt Injection and proposed defenses.

Link: https://doublespeak.chat/#/handbook

AI/LLM as a tool for cybersecurity

ChatGPT for security teams [AI Village]

Some ChatGPT tips & tricks (including jailbreaks) from GTKlondike (https://twitter.com/GTKlondike/)

Link: https://github.com/NetsecExplained/chatgpt-your-red-team-ally

Bonus: https://twitter.com/GTKlondike/status/1697087125840376216

AI in general

Initially, this newsletter was meant to be exclusively related to security, but in the last two weeks I’ve stumbled upon a few decent resources on LLMs and AI and I want to share them with you!

This post by Stephen Wolfram on how does ChatGPT (and LLMs in general) work:
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

Update of GPT 3.5 – fine-tuning is now available through OpenAI API:

https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates

This post by Chip Huyen on how does RLHF work: https://huyenchip.com/2023/05/02/rlhf.html and this one from Huggingface: https://huggingface.co/blog/rlhf

Some loose links

In this section you’ll find some links to recent AI security and LLM security papers that I didn’t manage to read. If you still want to read more on AI topics, try these articles.

“Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack”

Link: https://arxiv.org/pdf/2308.11894.pdf

“RatGPT: Turning online LLMs into Proxies for Malware Attacks”

Link: https://arxiv.org/pdf/2308.09183.pdf

“PENTESTGPT: An LLM-empowered Automatic Penetration Testing Tool”

Link: https://arxiv.org/pdf/2308.06782.pdf

“DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection”

Link: https://arxiv.org/pdf/2308.06932.pdf

“Devising and Detecting Phishing: large language models vs. Smaller Human Models”

Link: https://arxiv.org/pdf/2308.12287.pdf

This is the second release of my newsletter. I’ve collected some papers, articles and vulnerabilities that were released in last two weeks, this time the resources are categorized into following categories: LLM Security, AI Safety, AI Security. If you are not a mail subscriber yet, feel invited to subscribe: https://hackstery.com/newsletter/.

Order of the resources is random.

Any feedback on this newsletter is welcome – you can mail me or post a comment in this article.

LLM Security

Image to prompt injection in Google Bard

“Embrace The Red” blog on hacking Google Bard using crafted images with prompt injection payload.

Link: https://embracethered.com/blog/posts/2023/google-bard-image-to-prompt-injection/

Paper: Challenges and Applications of Large Language Models

Comprehensive article on LLM challenges and applications, with a lot of useful resources on prompting, hallucinations etc.

Link: https://arxiv.org/abs/2307.10169

Remote Code Execution in MathGPT

Post about how Seungyun Baek hacked MathGPT.

Link: https://www.l0z1k.com/hacking-mathgpt/

AVID ML (AI Vulnerability Database) Integration with Garak

Garak is a LLM vulnerability scanner created by Leon Derczynski. According to the description, garak checks if an LLM will fail in a way we don’t necessarily want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. AvidML supports integration with Garak for quickly converting the vulnerabilities garak finds into informative, evidence-based reports.

Link: https://avidml.org/blog/garak-integration/

Limitations of LLM censorship and Mosaic Prompt attack

Although censorship brings negative associations, in terms of LLMs it can be used to prevent LLM from creating malicious content, such as ransomware code. In this paper authors demonstrate attack method called Mosaic Prompt, which is basically splitting malicious prompts into sets of non-malicious prompts.

Link: https://www.cl.cam.ac.uk/~is410/Papers/llm_censorship.pdf

Security, Privacy and Ethical concerns of ChatGPT

Yet another paper on ChatGPT 🙂

Link: https://arxiv.org/pdf/2307.14192.pdf

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Using images and sounds for Indirect Prompt Injections. In this notebook you can take a look at the code used for generating images with injection: https://github.com/ebagdasa/multimodal_injection/blob/main/run_image_injection.ipynb

(I’ll be honest, it looks like magic)

Link: https://arxiv.org/abs/2307.10490

Universal and Transferable Adversarial Attacks on Aligned Language Models

Paper on creating transferable adversarial prompts, able to induce objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. This paper was supported by DARPA and the Air Force Research Laboratory.

Link: https://llm-attacks.org/zou2023universal.pdf + repo: https://github.com/llm-attacks/llm-attacks/tree/main/experiments

OWASP Top10 for LLM v1.0

OWASP released version 1.0 of Top10 for LLMs! You can also check my post on that list here.

Link: https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-v1_0.pdf

Survey on extracting training data from pre-trained language models

Survey based on more than 100 key papers in fields such as natural language processing and security, exploring and systemizing attacks and protection methods.

Link: https://aclanthology.org/2023.trustnlp-1.23/

Wired on LLM security

This article features OWASP Top10 for LLM and plugins security concerns.

Link: https://www.wired.com/story/chatgpt-plugins-security-privacy-risk/

AI Safety

Ensuring Safe, Secure, and Trustworthy AI

Amazon, Anthropic, Google, Inflection, Meta, Microsoft and OpenAI have agreed to self-regulate their AI-based solutions. In these voluntary commitments, the companies pledge to ensure safety, security and trust in artificial intelligence.

Link: https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/

Red teaming AI for bio-safety

Anthropic’s post on red teaming AI for biosafety and evaluating models capabilities i.e. for ability to output harmful biological information, such as designing and acquiring biological weapons.

Link: https://www.anthropic.com/index/frontier-threats-red-teaming-for-ai-safety

AI Security

AI Vulnerability Database releases Python library

According to documentation: “It empowers engineers and developers to build pipelines to export outcomes of tests in their ML pipelines as AVID reports, build an in-house vulnerability database, integrate existing sources of vulnerabilities into AVID-style reports, and much more!”

Link: https://twitter.com/AvidMldb/status/1683883556064616448

Mandiant – Securing AI pipeline

The article from Mandiant on securing the AI pipeline. Contains GAIA (Good AI Assessment) Top 10, a list of common attacks and weaknesses in the AI pipeline.

Link: https://www.mandiant.com/resources/blog/securing-ai-pipeline

Google paper on AI red teaming

Citing the summary of this document:

“In this paper, we dive deeper into SAIF to explore one critical capability that we deploy to support the SAIF framework: red teaming. This includes three important areas:
1. What red teaming is and why it is important
2. What types of attacks red teams simulate
3. Lessons we have learned that we can share with others”

Link: https://services.google.com/fh/files/blogs/google_ai_red_team_digital_final.pdf

Probably Approximately Correct (PAC) Privacy

MIT researchers have developed a technique to protect sensitive data encoded within machine learning models. By adding noise or randomness to the model, the researchers aim to make it more difficult for malicious agents to extract the original data. However, this perturbation reduces the model’s accuracy, so the researchers have created a framework called Probably Approximately Correct (PAC) Privacy. This framework automatically determines the minimal amount of noise needed to protect the data, without requiring knowledge of the model’s inner workings or training process.

Link: https://news.mit.edu/2023/new-way-look-data-privacy-0714