Welcome to the 7th release of the Real Threats of Artificial Intelligence Newsletter.

Below you’ll find some interesting links – if you are an offensive security practitioner, take a look at Kaggle/AI Village DEFCON Capture The Flag competition, where you can challenge your AI hacking skills (it’s still going for the next 2 weeks). I’d also recommend the talk “AI’s Underbelly: The Zero-Day Goldmine” by Dan McInerney from ProtectAI. This talk inspired me to create this post: https://hackstery.com/2023/10/13/no-one-is-prefect-is-your-mlops-infrastructure-leaking-secrets/

I’ve also started cataloging AI Security / AI Red Teaming job offers – check the “Jobs” section, if you consider stepping into the AI Security industry.

If you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance! What is more, if you are a blogger, researcher or founder in the area of AI Security/AI Safety/MLSecOps etc. feel free to send me your work and I will repost it in this newsletter 🙂

Source: Bing Image Creator

LLM Security

New release of OWASP Top10 for LLM

A new version of OWASP Top10 for LLM was released. More examples, increased readability etc. are present in this release. They also added this diagram that highlights how the vulnerabilities intersect with the application flow:

Link: website of the project: https://owasp.org/www-project-top-10-for-large-language-model-applications/
Simon’s post on LinkedIn: https://www.linkedin.com/pulse/new-release-owasp-top-10-llm-apps-steve-wilson

17 chars LLM jailbreak by @AIPanic

This guy is a wizard of prompts. Usually, “Do Anything Now” prompts are long and complicated. @AIPanic proves that just a few chars is enough to trigger the model to return harmful content.

https://twitter.com/AIPanic/status/1711431600230035740

Killer Replika chatbot

In 2021, a man broke into Windsor Castle with a crossbow. Later, he told the police that Replika chatbot told him to assassinate the Queen of England. Recently, he got sentenced

Link: https://www.theregister.com/2023/10/06/ai_chatbot_kill_queen/

AI-based coding assistants may leak API keys

GitHub Copilot and Amazon CodeWhisper can be coaxed to emit hardcoded credentials that these AI models captured during training, though not all that often.

Link: https://www.theregister.com/2023/09/19/github_copilot_amazon_api/

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Authors demonstrate an automated method of generating semantically meaningful jailbreaks.

Link: https://arxiv.org/abs/2310.04451

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Link: https://arxiv.org/abs/2310.06387

GPT-4 is too smart to be safe: stealthy chat with LLMs via cipher

This promising paper (currently under review) presents an approach for jailbreaking LLMs through usage of ciphers – i.e. Caesar cipher etc.

Link: https://openreview.net/pdf?id=MbfAK4s61A

Chatbot hallucinations are poisoning the web search (possible paywall)

A short story on how hallucinations from the chatbots poisoned GPT-powered Bing Chat.

Link: https://www.wired.com/story/fast-forward-chatbot-hallucinations-are-poisoning-web-search/

4chan users manipulate AI tools to unleash torrent of racist images

Link: https://arstechnica.com/tech-policy/2023/10/4chan-pushing-bing-dall-e-as-quick-methods-to-spread-racist-images/

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models (by Microsoft Research)

A paper is from July, but it was reposted on MS website a few days ago. Taxonomy of LLM-related risks can be a good starting point for Threat Modeling LLMs:

Links: https://techcrunch.com/2023/10/17/microsoft-affiliated-research-finds-flaws-in-gtp-4/,

https://www.microsoft.com/en-us/research/blog/decodingtrust-a-comprehensive-assessment-of-trustworthiness-in-gpt-models/,

https://github.com/AI-secure/adversarial-glue

AI Security

AI Security Has Serious Terminology Issues

What is the difference between AI Security, AI Safety, AI Red Teaming and AI Application Security? In this blog post, Joseph Thacker proposed the boundaries of each of the terms in order to make them more precise.

Link: https://josephthacker.com/ai/2023/10/16/ai-security-terminology-issues.html

AI Village CTF

Better late than never – this CTF ends on 9th of November – you can still give it a try and check your AI hacking skills!

Link: https://www.kaggle.com/competitions/ai-village-capture-the-flag-defcon31/

AI’s Underbelly: The Zero-Day Goldmine

Inspiring talk on MLOps/AIOps tools security by Dan McInerney:

Link: https://www.youtube.com/watch?v=e3ybnXjtpIc

Six steps for AI security

Post by Nvidia.

Source: Nvidia

Link: https://blogs.nvidia.com/blog/2023/09/25/ai-security-steps/

AI/LLM as a tool for cybersecurity

Compliance.sh

This AI-supported tool makes it easier to get compliant with ISO 27001, SOC 2 Type II, HIPAA, GDPR and more:

Link: https://compliance.sh/

Check for AI

This is a pretty convenient tool for detection of AI-generated text:

Link: https://www.checkfor.ai/

AI safety

To be honest usually I concentrate more on AI Security and I occasionally follow what’s going on in the world of AI Safety. Those resources look super cool – just check those designs!

Map of AI Existential Safety

In this map, whole set of resources related to the AI Safety is collected:

Link: https://aisafety.world/

Neuronpedia

In this game, you help with crowdsourcing explanations for the neurons inside of the neural networks:

Link: https://www.neuronpedia.org/

Frontier Model Forum will fund AI safety research

Frontier Model Forum announced that it’ll pledge $10 million toward a new fund to advance research on tools for “testing and evaluating the most capable AI models.”

Link: https://techcrunch.com/2023/10/25/ai-titans-throw-a-tiny-bone-to-ai-safety-researchers

Jobs

Other AI-related things

Killer drones used in Ukraine

If these reports are true, the first war drones that work without human supervision are being deployed in the battlefields in the Ukraine against Russians:

Link: https://www.unmannedairspace.info/commentary/ukraine-deploying-attack-drones-without-human-oversight/

Advent of Code prohibits the usage of LLMs

Link: https://adventofcode.com/about#ai_leaderboard

If you want more papers and articles

IN-CONTEXT UNLEARNING: LANGUAGE MODELS AS FEW SHOT UNLEARNERS, Pawelczyk, et. al.

Link: https://arxiv.org/pdf/2310.07579.pdf

Composite Backdoor Attacks Against Large Language Models, Huang, et. al.

Link: https://arxiv.org/pdf/2310.07676.pdf

Low-Resource Languages Jailbreak GPT-4, Yong, et.al.

Link: https://browse.arxiv.org/pdf/2310.02446.pdf