Author: mik0w

Real Threats of Artificial Intelligence – AI Security Newsletter #10

Post author By mik0w
Post date

Welcome back!

This is a special edition of The Real Threat of Artificial Intelligence AI Security newsletter – dedicated entirely to the Model Context Protocol (MCP). MCP isn’t just another AI buzzword; it’s rapidly becoming the backbone of how LLMs interact with the outside world. And with that comes a flood of new attack vectors, misconfigurations, and creative exploits that most teams aren’t ready for.

Disclaimer: I’ve started building my AI security course, where I will cover topics such as MCP security, agentic systems security, etc. If you want to sign up for the waitlist click here 🙂

MCP is a new (well, not so new, but still fresh) big thing in the AI/LLM world. There are thousands of MCP servers that you can connect your LLM to (refer here: https://hub.docker.com/mcp, https://mcpmarket.com/server), but have you ever wondered what security threats might arise from that solution? At the end of the day, MCP is just another layer of your software which encapsulates your APIs, etc.

But what threats exactly can occur in MCP servers?

Improper authentication

One of the main risks is exposing your MCP server publicly on the Internet – without authentication. According to this research by Knostic, hundreds of MCPs found online via Shodan are available without any type of authentication. That means attackers can use any of the tools available on the given MCP server without knowing any API keys or passwords. You can refer to this article for more information about securing access to your MCP.

New technology, old threats

Some things never change. While LLMs introduce new risks such as Prompt Injection or Excessive Agency, old vulnerabilities can still occur in LLMOps tools, MCP servers, etc. Multiple examples of RCE in MCP servers have been reported, including OS Command Injection in mcp-remote. Another example is RCE in Cursor through untrusted external data that can take control of the Cursor agent and exploit the agent’s privileges. Yet another example of RCE in MCP tooling is RCE in mcp-inspector by Anthropic.

Data exfiltration

While “traditional” apps enable developers to predict an app’s behavior in most situations, agentic systems are non-deterministic. An agent’s behavior can depend on so many factors that it’s difficult to predict all possible outcomes. Researchers from Invariant Labs have discovered that it’s possible to exfiltrate sensitive data from victims’ devices using a so-called “Tool Poisoning Attack” in which a malicious prompt is executed when calling a tool from a malicious MCP. They also found a way to access private GitHub repositories via MCP.

Tool shadowing

MCP’s flexibility, while letting agents connect to multiple tool servers, is also a hidden trap. A malicious server can shadow and hijack calls meant for legitimate tools, intercepting data or injecting responses without detection. Acuvity shows how a rogue MCP can masquerade as a trusted service, quietly turning your integration into a covert attack channel.

If you want to catch up on the MCP security, you’ll find some interesting articles, blog posts tools and research papers below:

Papers:

Blog posts:

Code & repos (be careful when running that):

https://github.com/fkautz/safe-mcp (security framework for documenting and mitigating threats in MCP)
https://github.com/knostic/MCP-Scanner (MCP scanner by Knostic)
https://github.com/harishsg993010/damn-vulnerable-MCP-server/ (vulnerable MCP server)

Tags LLM, MCP, newsletter

LLM Security MLOps

Real Threats of Artificial Intelligence – AI Security Newsletter #9

Post author By mik0w
Post date

Hello everyone!
It’s been a while, and although I’ve been keeping up with what’s happening in the AI world, I haven’t really had time to post new releases. I’ve also decided to change a form, and for some time I’ll be doing just the links instead of links + summaries. Let me know how you like the new form. I think it’s more useful, because in most cases you get the summary of the article from the beginning. Since this is a “resurrection” of this newsletter, I’ve tried to include some of the most important news from the last 5 months in AI security here. Also, I’ve started using the tool that detects if the LLM was used to create the content – this way I’m trying to filter out low quality content created with LLMs (I mean, if the content is created with ChatGPT, you could create it yourself, right?).

If you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance! What is more, if you are a blogger, researcher or founder in the area of AI Security/AI Safety/MLSecOps etc. feel free to send me your work and I will include it in this newsletter 🙂

LLM Security

Conditional prompt injection in MS Copilot

Bypassing rate limits in ChatGPT API

Red Teaming Snap’s AI for safety

Invisible prompt injection in HackerOne’s Hai chatbot

AI Security

AI Safety

Tags ai hacking, AI security, indirect prompt injection, LLM, llm security, newsletter

Uncategorized

Real Threats of Artificial Intelligence – AI Security Newsletter #8

Post author By mik0w
Post date

Welcome to the 8th release of The Real Threats of Artificial Intelligence.

It’s been more than a month since the last edition of this newsletter. I’ve had some things going on – including talks at OWASP Oslo Chapter and at Nordic AI Summit (you can find the slides here: https://hackstery.com/talks-and-slides/), so I haven’t really had spare time to spend on digging for resources for the newsletter. But I am back on track and hopefully upcoming releases will show up more regularly. Here are some articles on AI security that I’ve found in my “information bubble”. Also, in the beginning of January I’ll publish some more interesting things on MLOps leaking secrets!

LLM Security

Johann Rehberger’s talk on Prompt Injections at Ekoparty ‘23

Link: https://embracethered.com/blog/posts/2023/ekoparty-prompt-injection-talk/

Hacking Google Bard – From Prompt Injection to Data Exfiltration

Indirect prompt injections in Google Bard via Google Docs or Gmail.

Link: https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/

Prompt Injection Benchmark by Layier.AI

Layier.AI benchmarked Prompt Injection detection tools – incl. LLMGuard, Lakera Guard or RebuffAI.

Link: https://huggingface.co/spaces/laiyer/prompt-injection-benchmark + article: https://laiyer.substack.com/p/how-do-prompt-injection-scanners

Fine-tuned version of DebertaV3 model by LaiyerAI

This model aims to identify Prompt Injections and it got more than 600 thousand downloads at this point.

Link: https://huggingface.co/laiyer/deberta-v3-base-prompt-injection

DeepInception: Hypnotize Large Language Model to Be Jailbreaker

Jailbreaking large language models through nested prompts.

Link: https://arxiv.org/pdf/2311.03191.pdf

Meta’s new tools for LLM security

Meta released new tools (Llama Guard and Purple Llama) for safeguarding input and output in communication with Large Language Models and proposed a benchmark for evaluating the cybersecurity risks in the models.

Links:

https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai/

https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/

https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Data poisoning attacks during the fine tuning of the models.

Link: https://arxiv.org/pdf/2312.04748.pdf

AI Security

ProtectAI AI Exploits

A collection of real world AI/ML exploits for responsibly disclosed vulnerabilities by ProtectAI

Link: https://github.com/protectai/ai-exploits

Huntr tutorial for ML bug hunters

If you’ve ever wondered how you can start looking for the vulnerabilities in MLOps/ML tools, Huntr (bug bounty program for ML) has you covered.

Link: https://huntr.com/get-started/intro/

Assessing the security posture of a widely used vision model: YOLOv7

Trailofbits reports a bunch of vulnerabilities in YOLOv7, a computer vision framework. Following vulnerabilities were found: remote code execution (RCE), denial of service, and model differentials (where an attacker can trigger a model to perform differently in different contexts).

Link: https://blog.trailofbits.com/2023/11/15/assessing-the-security-posture-of-a-widely-used-vision-model-yolov7/

Google’s framework for ML supply chain security

In this framework, Google introduced code for model signing and Supply Chain Levels for Software Artifacts (SLSA)

Link: https://github.com/google/model-transparency

AI/LLM as a tool for cybersecurity

Cisco AI Assistant for Cybersecurity

Cisco released its new gen AI focused on supporting cybersecurity operations.

Link: https://www.uctoday.com/unified-communications/cpaas/introducing-the-cisco-ai-assistant-for-security/

Will cybersecurity engineers be replaced by AI?

Guess.

Link: https://blog.edned.net/will-ai-replace-cyber-security/

AI safety

Meta broken up its Responsible AI team

This link has been in my notes since November… Meta broke up its Responsible AI team. But, as you’ve seen in the “LLM Security” section, they are still working on Responsible AI.

Link: https://www.spiceworks.com/tech/artificial-intelligence/news/metas-dissolution-responsible-ai/

Jobs

AI/ML Penetration Tester at NetSPI (US)
Senior ML Engineer at Snowflake (Poland)
Senior Security Engineer (AI/ML) at Apple (US)
Offensive Security Engineer at AI Red Team at Microsoft (US)
Principal ML Security Engineer at ProtectAI (US)
Principal AI/ML Security Specialist at Sage (UK)

Other AI-related things

https://arxiv.org/pdf/2307.11760.pdf – tell the model you are stressed or under pressure to improve performance

https://www.bloomberg.com/opinion/articles/2023-11-20/who-controls-openai – who controls OpenAI

If you want more papers and articles

Is the Reversal Curse Real?

Newsletter

Real Threats of Artificial Intelligence – AI Security Newsletter #7

Post author By mik0w
Post date

Welcome to the 7th release of the Real Threats of Artificial Intelligence Newsletter.

Below you’ll find some interesting links – if you are an offensive security practitioner, take a look at Kaggle/AI Village DEFCON Capture The Flag competition, where you can challenge your AI hacking skills (it’s still going for the next 2 weeks). I’d also recommend the talk “AI’s Underbelly: The Zero-Day Goldmine” by Dan McInerney from ProtectAI. This talk inspired me to create this post: https://hackstery.com/2023/10/13/no-one-is-prefect-is-your-mlops-infrastructure-leaking-secrets/

I’ve also started cataloging AI Security / AI Red Teaming job offers – check the “Jobs” section, if you consider stepping into the AI Security industry.

If you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance! What is more, if you are a blogger, researcher or founder in the area of AI Security/AI Safety/MLSecOps etc. feel free to send me your work and I will repost it in this newsletter 🙂

Source: Bing Image Creator

LLM Security

New release of OWASP Top10 for LLM

A new version of OWASP Top10 for LLM was released. More examples, increased readability etc. are present in this release. They also added this diagram that highlights how the vulnerabilities intersect with the application flow:

Link: website of the project: https://owasp.org/www-project-top-10-for-large-language-model-applications/
Simon’s post on LinkedIn: https://www.linkedin.com/pulse/new-release-owasp-top-10-llm-apps-steve-wilson

17 chars LLM jailbreak by @AIPanic

This guy is a wizard of prompts. Usually, “Do Anything Now” prompts are long and complicated. @AIPanic proves that just a few chars is enough to trigger the model to return harmful content.

https://twitter.com/AIPanic/status/1711431600230035740

Killer Replika chatbot

In 2021, a man broke into Windsor Castle with a crossbow. Later, he told the police that Replika chatbot told him to assassinate the Queen of England. Recently, he got sentenced

Link: https://www.theregister.com/2023/10/06/ai_chatbot_kill_queen/

AI-based coding assistants may leak API keys

GitHub Copilot and Amazon CodeWhisper can be coaxed to emit hardcoded credentials that these AI models captured during training, though not all that often.

Link: https://www.theregister.com/2023/09/19/github_copilot_amazon_api/

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Authors demonstrate an automated method of generating semantically meaningful jailbreaks.

Link: https://arxiv.org/abs/2310.04451

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Link: https://arxiv.org/abs/2310.06387

GPT-4 is too smart to be safe: stealthy chat with LLMs via cipher

This promising paper (currently under review) presents an approach for jailbreaking LLMs through usage of ciphers – i.e. Caesar cipher etc.

Link: https://openreview.net/pdf?id=MbfAK4s61A

Chatbot hallucinations are poisoning the web search (possible paywall)

A short story on how hallucinations from the chatbots poisoned GPT-powered Bing Chat.

Link: https://www.wired.com/story/fast-forward-chatbot-hallucinations-are-poisoning-web-search/

4chan users manipulate AI tools to unleash torrent of racist images

Link: https://arstechnica.com/tech-policy/2023/10/4chan-pushing-bing-dall-e-as-quick-methods-to-spread-racist-images/

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models (by Microsoft Research)

A paper is from July, but it was reposted on MS website a few days ago. Taxonomy of LLM-related risks can be a good starting point for Threat Modeling LLMs:

Links: https://techcrunch.com/2023/10/17/microsoft-affiliated-research-finds-flaws-in-gtp-4/,

https://www.microsoft.com/en-us/research/blog/decodingtrust-a-comprehensive-assessment-of-trustworthiness-in-gpt-models/,

https://github.com/AI-secure/adversarial-glue

AI Security

AI Security Has Serious Terminology Issues

What is the difference between AI Security, AI Safety, AI Red Teaming and AI Application Security? In this blog post, Joseph Thacker proposed the boundaries of each of the terms in order to make them more precise.

Link: https://josephthacker.com/ai/2023/10/16/ai-security-terminology-issues.html

AI Village CTF

Better late than never – this CTF ends on 9th of November – you can still give it a try and check your AI hacking skills!

Link: https://www.kaggle.com/competitions/ai-village-capture-the-flag-defcon31/

AI’s Underbelly: The Zero-Day Goldmine

Inspiring talk on MLOps/AIOps tools security by Dan McInerney:

Link: https://www.youtube.com/watch?v=e3ybnXjtpIc

Six steps for AI security

Post by Nvidia.

Source: Nvidia

Link: https://blogs.nvidia.com/blog/2023/09/25/ai-security-steps/

AI/LLM as a tool for cybersecurity

Compliance.sh

This AI-supported tool makes it easier to get compliant with ISO 27001, SOC 2 Type II, HIPAA, GDPR and more:

Link: https://compliance.sh/

Check for AI

This is a pretty convenient tool for detection of AI-generated text:

Link: https://www.checkfor.ai/

AI safety

To be honest usually I concentrate more on AI Security and I occasionally follow what’s going on in the world of AI Safety. Those resources look super cool – just check those designs!

Map of AI Existential Safety

In this map, whole set of resources related to the AI Safety is collected:

Link: https://aisafety.world/

Neuronpedia

In this game, you help with crowdsourcing explanations for the neurons inside of the neural networks:

Link: https://www.neuronpedia.org/

Frontier Model Forum will fund AI safety research

Frontier Model Forum announced that it’ll pledge $10 million toward a new fund to advance research on tools for “testing and evaluating the most capable AI models.”

Link: https://techcrunch.com/2023/10/25/ai-titans-throw-a-tiny-bone-to-ai-safety-researchers

Jobs

Other AI-related things

Killer drones used in Ukraine

If these reports are true, the first war drones that work without human supervision are being deployed in the battlefields in the Ukraine against Russians:

Link: https://www.unmannedairspace.info/commentary/ukraine-deploying-attack-drones-without-human-oversight/

Advent of Code prohibits the usage of LLMs

Link: https://adventofcode.com/about#ai_leaderboard

If you want more papers and articles

IN-CONTEXT UNLEARNING: LANGUAGE MODELS AS FEW SHOT UNLEARNERS, Pawelczyk, et. al.

Link: https://arxiv.org/pdf/2310.07579.pdf

Composite Backdoor Attacks Against Large Language Models, Huang, et. al.

Link: https://arxiv.org/pdf/2310.07676.pdf

Low-Resource Languages Jailbreak GPT-4, Yong, et.al.

Link: https://browse.arxiv.org/pdf/2310.02446.pdf

MLOps

No one is Prefect – is your MLOps infrastructure leaking secrets?

Post author By mik0w
Post date

I watched this inspiring talk today. On the one hand, my interest in MLOps tooling security and vulnerabilities had been growing for some time, yet on the other hand, I was somewhat uncertain about how to approach it. Finally, after watching Dan’s talk, I decided to start with so-called low hanging fruits – vulnerabilities that are easy to find and often have a critical impact.

This post is not a disclosure of any specific vulnerabilities, it mainly focuses on the misconfigurations. Companies or individuals that were impacted by the described misconfigurations have been informed and – at least in most of the cases – I got a quick response and misconfigurations were fixed.

Nevertheless we talk about some old-school industrial control system for sewage tanks, self-hosted NoSQL databases or modern MLOps software, one thing never changes – misconfigurations happen. No matter how comprehensive the documentation is, if the given software is not secure by default, there will always be at least a few people who would deploy their instance so heavily misconfigured, that you start to wonder whether what you’ve encountered is a honeypot.

I’ve spent a cozy evening with Shodan and in this post I will give you a few examples of funny misconfigurations in various MLOps-related systems. Maybe I will provide some recommendations as well. Last but not least, I want to highlight the issue with the worryingly low level of security in the MLOps/LLMOps (call it as you wish, DevOps for AI or whatever) area.

MLOps tool #1: Prefect

Prefect is a modern workflow orchestration [tool] for data and ML engineers. And very often, it’s available without any authentication on the Internet. That applies to the self-hosted usage of Prefect.

The examples below come from real Prefect deployments, which were available online.

Random Prefect Server instance exposed online (without authentication)

In Prefect Server, you can create flows (a flow is a container for workflow logic as-code), each of the flows composed of a set of tasks (task is a function that represents a discrete unit of work in a Prefect workflow). Then, once you have your flows created, you want to run them as a deployment. You can store configuration for your flows and deployments in blocks. According to the documentation of Prefect:
With blocks, you can securely store credentials for authenticating with services like AWS, GitHub, Slack, and any other system you’d like to orchestrate with Prefect.

The thing is you can, but you don’t have to, if you really don’t want to. Some blocks enable user to store secrets in plaintext, for example JSON block:

Zoho credentials in plaintext, inside of the JSON Block.

Another block, that discloses the secrets is the SQLAlchemy Connector* (*but only in some cases). Below you can see an example of Postgres database credentials – available in plaintext, without authentication:

Database credentials leaked

Yet another example of credentials leak – Minio Credentials stored in Remote File System block’s settings:

More credentials!

I have informed owners of the exposed credentials of the issue. But in the first place, there wouldn’t be any issue, if they took care and deployed Prefect properly.

Shodan queries, if you want to find some exposed instances of Prefect on the Shodan:

http.title:”prefect orion”

http.title:”prefect server”

http.title:”prefect”

MLOps/LLMOps tool #2: Flowise

Flowise falls into the fancy-named LLMOps category of software. It’s a visual tool that enables users to build customized LLM flows using LangchainJS.

Example of the LLM Chain from Flowise website

Of course, Flowise doesn’t offer authentication by default (it’s super-easy to set up, as far as documentation says, but it’s not default though). Access to the “control center” of LLM-based app it’s dangerous by itself, as by manipulation of the LLM parameters an attacker may spread misinformation, fake news or hate speech. But let’s check what else can we achieve through the access to the unauthenticated instances of Flowise.

Access to all of the data collected in the chatbot

There is a magic endpoint in the Flowise – /api/v1/database/export. If you query this endpoint, you may download all of the data available in the given instance of Flowise.

That contains: chatbots’ history, initial prompts of your LLM apps, all of the documents stored and processed by the LLM chains and even the API key (I guess the API key is useful only if the authentication is enabled, otherwise it is not needed).

Querying /api/v1/database/export – censored view

Okay, let’s say that access to the chatbot’s history is quite a serious issue. But can misconfigured Flowise impact other systems in our organization? Yes, it can!

Credentials in plaintext

I am not sure how it works, but some of the credentials in Flowise are encrypted, meanwhile some are just stored in plaintext format, waiting for cybercriminals to make use of them.

So imagine that you see something like that in Flowise:

At the very beginning I just assumed that this form named “OpenAI API Key” is just a placeholder or something like that. Nobody would store API keys like that, right?… Well, here’s what I saw after I clicked “Inspect” at this element:

wtf.

That’s right, a fresh and fragrant OpenAI API key. Why was it returned to my browser? I don’t know. What I know is the fact that dozens of OpenAI API keys can be stolen this way. But it’s not just OpenAI keys that are at risk, I’ve seen plenty of other keys stored this way.

Github access token

So, while OpenAI key theft may only lead to consumption of the funds on the card, which is connected to the OpenAI payment system, leak of GH or HuggingFace keys may lead to theft of your code, theft or deletion of your trained ML models etc.

You can query the HuggingFace API for details of the account and proceed with an attack on someone’s MLOps infrastructure:

HuggingFace enumeration

In this case, leaked keys belong to a few individuals (I’ve contacted them and they have hidden their Flowise deployments and re-generated the keys).

Shodan query for Flowise is simple as that:

http.title:”flowise”

or

http.html:”flowise”

MLOps tool #3: Omniboard

Omniboard is a web dashboard for the Sacred machine learning experiment management tool. It connects to the MongoDB database used by Sacred and helps in visualizing the experiments and metrics / logs collected in each experiment.

Of course, by default it does not support the authentication, so there’s plenty of the Omniboard instances exposed on the Internet.

Through Omniboard you can take a look into the source code of the experiment. That’s pretty risky, as if you are developing your model for commercial purpose, I assume you’d rather want your code to remain confidential and inaccessible to the competitors.

In the example above you can see a combo – not only the source code is exposed, but also hardcoded credentials to access the MongoDB instance.

Shodan query for Omniboard is:

http.title:”omniboard”

Recommendations

All of the issues described above aren’t really security vulnerabilities. The main mitigation for the threats that I’ve described is configuring applications in your MLOps stack correctly, so these aren’t exposed to the public network without proper authentication. You should also process your secrets and credentials carefully. Of course that would be nice to have MLOps tools “secure by default”, but it will take a while I guess…

If you want to receive the latest news from the world of AI security, subscribe to my newsletter: https://hackstery.com/newsletter/

Tags AI security, ML security, MLOps, Omniboard, Prefect

Newsletter

Real Threats of Artificial Intelligence – AI Security Newsletter #6

Post author By mik0w
Post date

Here comes another edition of my newsletter. I’ve collected some interesting resources on AI and LLM security – most of them published in the last two weeks of September.

If you are not a subscriber yet, feel invited to subscribe here.

Also, if you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance!

Autumn-themed thumbnail generated with Bing Image Creator 🙂

LLM Security

OpenAI launches Red Teaming Network

OpenAI announced an open call for OpenAI Red Teaming Network. In this interdisciplinary initiative, they want to improve the security of their models. Not only do they invite red teaming experts with backgrounds in cybersecurity, but also experts from other domains, with a variety of cultural backgrounds and languages.

Link: https://openai.com/blog/red-teaming-network

I am building a payloads’ set for LLM security testing

Shameless auto-promotion, but I’ve started working on PALLMS (Payloads for Attacking Large Language Models) project, within which I want to build huge base of payload, which can be utilized while attacking LLMs. There’s no such an initiative publicly available on the Internet, so that’s a pretty fresh project. Contributors welcome!

Link: https://github.com/mik0w/pallms

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins

In this paper (by Iqbal, et. al.) authors review the security of ChatGPT plugins. That’s a great supplement for OWASP Top10 for LLM LLM:07 – Insecure Plugin Design vulnerability. Not only have authors analyzed the attack surface, but also they demonstrated potential risks on real-life examples. In this paper, you will find an analysis of threats such as: hijacking user machine, plugin squatting, history sniffing, LLM session hijacking, plugin response hallucination, functionality squatting, topic squatting and many more. The topic is interesting and I recommend this paper!

Link: https://arxiv.org/pdf/2309.10254.pdf

Wunderwuzzi – Advanced Data Exfiltration Techniques with ChatGPT

In this blog post, awesome @wunderwuzzi presents a variety of techniques for ChatGPT chat history data exfiltration by combining techniques such as indirect prompt injection and using plugins in a malicious way.

Link: https://embracethered.com/blog/posts/2023/advanced-plugin-data-exfiltration-trickery/

Security Weaknesses of Copilot Generated Code in GitHub

In this paper, Fu, et. al. analyze security of the code generated using GH copilot. I will just paste a few sentences from the article’s summary:

“Our results show: (1) 35.8% of the 435 Copilot generated code snippets contain security weaknesses, spreading across six programming languages. (2) The detected security weaknesses are diverse in nature and are associated with 42 different CWEs. The CWEs that occurred most frequently are CWE-78: OS Command Injection, CWE-330: Use of Insufficiently Random Values, and CWE-703: Improper Check or Handling of Exceptional Conditions (3) Among these CWEs, 11 appear in the MITRE CWE Top-25 list(…)”

Review your code – either from Copilot or from ChatGPT!

Link: https://arxiv.org/pdf/2310.02059.pdf

Jailbreaker in Jail: Moving Target Defense for Large Language Models

In this paper, authors demonstrate how Moving Target Defense (MTD) technique enabled them to protect LLMS against adversarial prompts.

Link: https://arxiv.org/pdf/2310.02417.pdf

Can LLMs be instructed to protect personal information?

In this paper, the authors announced PrivQA – “a multimodal benchmark to assess this privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario.”

Link: https://llm-access-control.github.io/

Bing Chat responses infiltrated by ads pushing malware

As Bing Chat is scraping the web, malicious ads have been detected to be actively injected into its responses. Kind of reminds me of an issue I’ve found in Chatsonic in May ’23.

Link: https://www.bleepingcomputer.com/news/security/bing-chat-responses-infiltrated-by-ads-pushing-malware/

Image-based prompt injection in Bing Chat AI

Link: https://arstechnica.com/information-technology/2023/10/sob-story-about-dead-grandma-tricks-microsoft-ai-into-solving-captcha/

AI Security

NSA is creating a hub for AI Security

The American National Security Agency has just launched a hub for AI security – The AI Security Center. One of the goals is to create the risk frameworks for AI security. Paul Nakasone, the director of the NSA, proposes an elegant definition of AI security:
“Protecting systems from learning, doing and revealing the wrong thing”.

Link: https://therecord.media/national-security-agency-ai-hub

Study on the robustness of AI-Image detection

In this paper, researchers have proven that the detectors of AI-generated images have multiple vulnerabilities and there isn’t a good way for proving if the image is real or generated by the AI. “Our attacks are able to break every existing watermark that we have encountered” – said the researchers.

Link: https://www.theregister.com/2023/10/02/watermarking_security_checks/ + paper: https://arxiv.org/pdf/2310.00076.pdf

ShellTorch (critical vulnerability!)

A critical vulnerability has been found in TorchServe – PyTorch model server. This vulnerability allows access to proprietary AI models, insertion of malicious models, and leakage of sensitive data – and can be used to alter the model’s results or to execute a full server takeover.

Here’s a visual explanation of this vulnerability from BleepingComputer:

Link: https://www.oligo.security/shelltorch

AI/LLM as a tool for cybersecurity

Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions

In this paper, the conclusion is that LLMs are not the best tool to provide S&P advice, but for some reason, the researchers (Chen, Arunasalam, Celik) haven’t tried to either fine-tune the model using fine-tuning APIs, or to use embeddings – thus, I believe the question remains kind of open. In my opinion, if you fine-tune the model on your knowledge base or if you create some kind of embedding of your data, then the quality of S&P advice should go up.

Link: https://arxiv.org/pdf/2310.02431.pdf

Regulations

Map of AI regulations all over the world

Fairly AI team have done this super cool work and published a map of AI regulations all over the world. Useful for anyone working with a legal side of AI!

The map legend:

Green: Regulation that’s passed and now active.
Blue: Passed, but not live yet.
Yellow: Currently proposed regulations.
Red: Regions just starting to talk about it, laying down some early thoughts.

Link: https://www.google.com/maps/d/u/0/viewer?mid=1grbvr9Ic-qJ-LTC9DHqpdzi2M-mtxl4&ll=15.171472397416672%2C0&z=2

Some thoughts on why AI shouldn’t be regulated, but rather decentralized

Link: https://cointelegraph.com/news/coinbase-ceo-warns-ai-regulation-calls-for-decentralization

Canada aims to be the first country in the world with official regulations covering the AI sector

Link: https://venturebeat.com/ai/canada-ai-code-of-conduct/

Other AI-related things

Build an end-to-end MLOps pipeline for visual quality inspection at the edge

In this 3-part series, AWS team demonstrates how to build MLOps pipelines:

If you want more papers and articles

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP, Yan, et. al., link: https://arxiv.org/pdf/2308.02122.pdf
When to Trust AI: Advances and Challenges for Certification of Neural Networks, Kwiatkowska, Zhang, link: https://arxiv.org/pdf/2309.11196.pdf
How well does LLM generate security tests?, Zhang, et. al. link: https://arxiv.org/pdf/2310.00710.pdf
Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT, Begoulink, et. al., link: https://arxiv.org/pdf/2309.10463.pdf

Tags ai hacking, AI security, LLM, llm security, newsletter

Newsletter

Real Threats of Artificial Intelligence – AI Security Newsletter #5

Post author By mik0w
Post date

Here comes another edition of my newsletter. This month I was away from the computer for a whole week, but I’ve collected some interesting resources on AI and LLM security – most of them published in the first two weeks of September.

Thumbnail generated with Stable Diffusion 🙂

LLM Security

Dropbox LLM Security

This repository contains scripts and descriptions that demonstrate attacks on LLMs using repeated characters. Long story short: if you supply a long string of a single character (or sequence of characters), the model will hallucinate. Also, it may reveal its instructions.

Link: https://github.com/dropbox/llm-security

LLM apps: Don’t Get Stuck in an Infinite Loop!

Post by @wunderwuzzi about looping ChatGPT through Indirect Prompt Injection. I am not sure if that can be classified as a DoS attack, but if you’d classify it as such, then it’d probably be the first publicly demonstrated DoS on LLM!

Link: https://embracethered.com/blog/posts/2023/llm-cost-and-dos-threat/

BlindLlama by Mithril Security

BlindLlama by Mithril Security is a project that provides “zero-trust AI APIs for easy and private consumption of open-source LLMs”. In other words, if you were concerned about passing confidential data to the LLM’s API and at the same time you didn’t want to deploy open-source models locally, this might be the solution for you.

Links: blog: https://blog.mithrilsecurity.io/introducing-blindllama-zero-trust-ai-apis-with-privacy-guarantees-traceability/ + docs: https://blindllama.mithrilsecurity.io/en/latest/ + Github: https://github.com/mithril-security/blind_llama/

Demystifying RCE Vulnerabilities in LLM-Integrated Apps

According to the authors, these two factors have a huge impact on the security of LLM-integrated applications:

the unpredictable responses of LLMs, which can be manipulated by attackers to bypass developer restrictions (using specific prompts)
the execution of untrusted code generated by LLMs, often without appropriate checks, allowing remote code execution.

This has serious implications not only for LLMs, but also for applications integrated with LLMs.

Authors proposed automated approach for identifying RCE vulnerabilities in LLMs – LLMSmith:

According to the article, they have created “the first automated prompt-based exploitation method for LLM-integrated apps.”, unfortunately, I could not find LLMSmith’s source anywhere…

Link: https://arxiv.org/pdf/2309.02926.pdf

AI Security

Some more resources from DEFCON31

38TB of data accidentally exposed by Microsoft AI researchers

Wiz Research found a data exposure incident on Microsoft’s AI GitHub repository.

Link: https://www.wiz.io/blog/38-terabytes-of-private-data-accidentally-exposed-by-microsoft-ai-researchers

MLSecOps Podcast: Rob van der Veer and Ian Swanson

AI veteran Rob van der Veer in MLSecOps podcast. One of the topics discussed by the speakers is ISO 5338, a new standard for AI system life cycle processes

Link: https://mlsecops.com/podcast/a-holistic-approach-to-understanding-the-ai-lifecycle-and-securing-ml-systems-protecting-ai-through-people-processes-technology

AI/LLM as a tool for cybersecurity

LLM in the Shell: Generative Honeypots

In this paper, authors demonstrated an interesting application of LLMs – they’ve used them as a honeypot backend in the sheLLM project. An idea is to trick an attacker into thinking that he’s using a real shell, meanwhile the outputs for given shell commands are generated by the LLM. It makes me wonder though – what would happen if an attacker realizes that he’s using LLM? Prompt Injection through this shell could be pricey for owners of the honeypot!

Link: https://arxiv.org/pdf/2309.00155.pdf

Automatic Scam-Baiting Using ChatGPT

That’s a brilliant idea for the usage of LLM – baiting scammers, making them lose money and time!

Link: https://arxiv.org/pdf/2309.01586.pdf

Cybercriminals Use Generative AI (…) to Run Their Scams

Speaking of baiting scammers – I wonder if somewhere on the Internet right now the LLM-defender is baiting the LLM-scammer.

Link: https://abnormalsecurity.com/blog/generative-ai-nigerian-prince-scams

Regulations

Dallas AI newsletter on AI regulations in various countries

Link: https://www.linkedin.com/pulse/state-ai-regulation-september-2023-newsletter-dallas-ai/

Overview of the AI regulations in various countries from Reuters

Link: https://www.reuters.com/technology/governments-race-regulate-ai-tools-2023-09-13/

If you want more papers and articles

To be honest, I just took a look at the abstracts of those papers below due to the lack of time, but maybe you will find some of them interesting.

“Software Testing with Large Language Model: Survey, Landscape, and Vision” – Wang, et. al.

Link: https://arxiv.org/abs/2307.07221

“MathAttack: Attacking Large Language Models Towards Math Solving Ability” – Zhou, et. al.

(This one is interesting, take a look at those examples:

)

Link: https://arxiv.org/pdf/2309.01686.pdf

“INTEGRATED PHOTONIC AI ACCELERATORS UNDER HARDWARE SECURITY ATTACKS: IMPACTS AND COUNTERMEASURES” – de Magalhaes, Nicolescu, Nikdast

This paper is on hardware trojans in the silicon photonic systems. Probably you need to have some advanced knowledge (which I don’t have) to be able read it, but when I saw this title, I felt like in this meme, so I am just sharing the link:

Link: https://arxiv.org/pdf/2309.02543.pdf

Remember that you can subscribe this newsletter here: https://hackstery.com/newsletter

If you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance!

Newsletter

Real Threats of Artificial Intelligence – AI Security Newsletter #4 (September ’23)

Post author By mik0w
Post date

Here comes the fourth release of my newsletter. This time I have included a lot of content related to the DEFCON AI Village (I have tagged content that comes from there) – a bit late, but better later than never. Anyway, enjoy reading.

Also, if you find this newsletter useful, I’d be grateful if you’d share it with your tech circles, thanks in advance!

Any feedback on this newsletter is welcome – you can mail me or post a comment in this article.

AI Security

Model Confusion – Weaponizing ML models for red teams and bounty hunters [AI Village]

This is an excellent read about ML supply chain security by Adrian Wood. One of the most insightful resources on the ML supply chain that I’ve seen. Totally worth reading!

Link: https://5stars217.github.io/2023-08-08-red-teaming-with-ml-models/

Assessing the Vulnerabilities of the Open-Source Artificial Intelligence (AI) Landscape: A Large-Scale Analysis of the Hugging Face Platform [AI Village]

Researchers have performed automated analysis of 110 000 models from Hugging Face and have found almost 6 million vulnerabilities in the code.

Links: slides: https://aivillage.org/assets/AIVDC31/DSAIL%20DEFCON%20AI%20Village.pdf paper: https://www.researchgate.net/publication/372761501_Assessing_the_Vulnerabilities_of_the_Open-Source_Artificial_Intelligence_AI_Landscape_A_Large-Scale_Analysis_of_the_Hugging_Face_Platform

Podcast on MLSecOps [60 min]

Ian Swanson (CEO of Protect AI) & Emilio Escobar (CISO of Datadog) are talking about ML & AI Security, MLSecOps, Supply Chain Security and LLMs:

Link: https://shomik.substack.com/p/17-ian-swanson-ceo-of-protect-ai

Regulations

LLM Legal Risk Management, and Use Case Development Strategies to Minimize Risk [AI Village]

Well, I am not a lawyer. But I do know a few lawyers who read this newsletter, so maybe you will find these slides on the legal aspects of LLM risk management interesting 🙂

Link: https://aivillage.org/assets/AIVDC31/Defcon%20Presentation_2.pdf

Canadian Guardrails for Generative AI

Canadians have created a document with a set of guardrails for developers and operators of Generative AI systems.

Link: https://ised-isde.canada.ca/site/ised/en/consultation-development-canadian-code-practice-generative-artificial-intelligence-systems/canadian-guardrails-generative-ai-code-practice

LLM Security

LLMSecurity.net – A Database of LLM-security Related Resources

Website by Leon Derczynski (LI: https://www.linkedin.com/in/leon-derczynski/ ) that catalogs various papers, articles and news regarding

Link: https://llmsecurity.net/

LLMs Hacker’s Handbook

This thing was on the Internet for a while, but for some reason I’ve never seen it. LLM Hacker’s Handbook with some useful techniques of Prompt Injection and proposed defenses.

Link: https://doublespeak.chat/#/handbook

AI/LLM as a tool for cybersecurity

ChatGPT for security teams [AI Village]

Some ChatGPT tips & tricks (including jailbreaks) from GTKlondike (https://twitter.com/GTKlondike/)

Link: https://github.com/NetsecExplained/chatgpt-your-red-team-ally

Bonus: https://twitter.com/GTKlondike/status/1697087125840376216

AI in general

Initially, this newsletter was meant to be exclusively related to security, but in the last two weeks I’ve stumbled upon a few decent resources on LLMs and AI and I want to share them with you!

This post by Stephen Wolfram on how does ChatGPT (and LLMs in general) work:
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

Update of GPT 3.5 – fine-tuning is now available through OpenAI API:

https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates

This post by Chip Huyen on how does RLHF work: https://huyenchip.com/2023/05/02/rlhf.html and this one from Huggingface: https://huggingface.co/blog/rlhf

Some loose links

In this section you’ll find some links to recent AI security and LLM security papers that I didn’t manage to read. If you still want to read more on AI topics, try these articles.

“Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack”

Link: https://arxiv.org/pdf/2308.11894.pdf

“RatGPT: Turning online LLMs into Proxies for Malware Attacks”

Link: https://arxiv.org/pdf/2308.09183.pdf

“PENTESTGPT: An LLM-empowered Automatic Penetration Testing Tool”

Link: https://arxiv.org/pdf/2308.06782.pdf

“DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection”

Link: https://arxiv.org/pdf/2308.06932.pdf

“Devising and Detecting Phishing: large language models vs. Smaller Human Models”

Link: https://arxiv.org/pdf/2308.12287.pdf

Tags ai hacking, ai safety, AI security, LLM, llm security, newsletter

Newsletter

Real Threats of Artificial Intelligence – AI Security Newsletter #3 (August ’23)

Post author By mik0w
Post date

This is the third release of my newsletter. I’ve collected some papers, articles and vulnerabilities that were released more or less in last two weeks. If you are not a mail subscriber yet, feel invited to subscribe: https://hackstery.com/newsletter/. Order of the resources is random.

Any feedback on this newsletter is welcome – you can mail me or post a comment in this article.

AI Security

1. Protect.AI launches AI bug bounty program

Protect.AI launches the first platform dedicated for AI/ML bug bounty. It aims to bridge the knowledge gap in AI/ML security research and provides opportunities for researchers to build expertise and receive financial rewards. In order to run bug bounty programs Protect.AI acquired huntr.dev, platform known for running bug bounties for OSS. You can report vulnerabilities in there: huntr.mlsecops.com

Link: https://www.businesswire.com/news/home/20230808746694/en/Protect-AI-Acquires-huntr-Launches-World%E2%80%99s-First-Artificial-Intelligence-and-Machine-Learning-Bug-Bounty-Platform

AI Safety

1. Zoom tried using customers data to train AI

After all, they gave up on that idea though. Zoom initially attempted to rectify the situation with an updated blog post, but it failed to address the specific concerns. CEO Eric Yuan acknowledged the issue as a “process failure” and promised immediate action. On August 11th, Zoom updated its terms again, explicitly stating that it does not use customer content for training AI models. The incident serves as a reminder for companies to be transparent and allow customers to opt-out of data usage for such purposes.

Links: https://stackdiary.com/zoom-terms-now-allow-training-ai-on-user-content-with-no-opt-out/ & https://blog.zoom.us/zooms-term-service-ai/ & https://www.nbcnews.com/tech/innovation/zoom-ai-privacy-tos-terms-of-service-data-rcna98665

LLM Security

1. Trojan detection challenge

At the end of July NeurIPS started a competition intended to improve methods for finding hidden features in large language models. There are two paths in the competition: Trojan Detection and Red Teaming. In the Trojan Detection part, participants have to find the commands that activate hidden features in these models. In the Red Teaming part, participants have to create methods that make the models do things they’re not supposed to do (and models are said to avoid those specific actions). It’s an academic competition for advanced researchers, but maybe some of the subscribers will find it interesting.

Link: https://trojandetection.ai/

2. Prompt-to-SQL injection

This article analyzes prompt-to-SQL (P2SQL) injection in web applications using the Langchain framework as a case study. The study examines different types of P2SQL injections and their impact on application security. The researchers also evaluate seven popular LLMs and conclude that P2SQL attacks are common in various models. To address these attacks, the paper proposes four effective defense techniques that can be integrated into the Langchain framework.

Link: https://arxiv.org/pdf/2308.01990.pdf

3. NVIDIA on protecting LLMs against prompt injection

NVIDIA’s AI Red Team released an interesting article on protecting Large Language Models against prompt injection attacks. They also disclosed a few vulnerabilities in LangChain plugins.

LangChain RCE

NVIDIA’s AI Red Team assessment framework (https://developer.nvidia.com/blog/nvidia-ai-red-team-an-introduction/)

Link: https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/

4. How to mitigate prompt injection?

This repo demonstrates variety of approaches for preventing Prompt Injection in Large Language Models.

Link: https://github.com/Valhall-ai/prompt-injection-mitigations

5. Evaluation of the jailbrak prompts

A paper on Jailbreaking Large Language Models:

Jailbreak example

Link: https://arxiv.org/pdf/2308.03825.pdf

6. Trick for cheaper usage of LLMs

Researchers propose a prompt abstraction attack(?). Thanks to abstracting prompt sentences, the prompt utilizes less tokens and it’s lighter. I’d argue that it’s an attack, it’s rather an optimization (saying that it’s an attack is like saying that optimizing cloud deployment is an attack, because you pay less). On the other hand you need to use “pseudo-API” in the middle, but still I wouldn’t consider it an attack. Change my mind.

Link: https://arxiv.org/pdf/2308.03558.pdf

7. Assessing quality of the code produced by the LLM

In this paper, authors the quality of LLM-generated code:

Link: https://arxiv.org/pdf/2308.04838.pdf

8. LLM Guard – tool for securing LLMs

New tool that may be helpful in securing against prompt injections: “LLM-Guard is a comprehensive tool designed to fortify the security of Large Language Models (LLMs). By offering sanitization, detection of harmful language, prevention of data leakage, and resistance against prompt injection and jailbreak attacks, LLM-Guard ensures that your interactions with LLMs remain safe and secure.”

Link: https://github.com/laiyer-ai/llm-guard

AI/LLM as a tool for cybersecurity

1. Getting pwn’d by AI

In this paper, authors are discussing two use cases of LLMs in pentesting: high-level task planning for security testing assignments and low-level vulnerability hunting within a vulnerable virtual machine. Here is the repo with code: https://github.com/ipa-lab/hackingBuddyGPT
This is an interesting approach to testing security, although as a pentester, I doubt that AI will take over industry in the coming years. In my opinion, it’s crucial for pentester to see relationships between the components of the system “outside of the box”, and in finding more advanced bugs a real person will remain irreplaceable. At least I hope so 😀

Link: https://arxiv.org/abs/2308.00121

2. Using LLMs to analyze Software Supply Chain Security

Supply chain security is another hot topic in cybersecurity. Authors analyze the potential of using LLMs for assuring Supply Chain Security. Citation from the article: “We believe the current generation of off-the-shelf LLMs does not offer a high enough level of agreement with expert judgment to make it a useful assistant in this context. One potential path to improving

performance is fine-tuning the LLM using baseline knowledge such as this catalog, and then applying it on future issues”

Link: https://arxiv.org/pdf/2308.04898.pdf

Newsletter

Real Threats of Artificial Intelligence – AI Security Newsletter #2 (August ’23)

Post author By mik0w
Post date

This is the second release of my newsletter. I’ve collected some papers, articles and vulnerabilities that were released in last two weeks, this time the resources are categorized into following categories: LLM Security, AI Safety, AI Security. If you are not a mail subscriber yet, feel invited to subscribe: https://hackstery.com/newsletter/.

Order of the resources is random.

Any feedback on this newsletter is welcome – you can mail me or post a comment in this article.

LLM Security

Image to prompt injection in Google Bard

“Embrace The Red” blog on hacking Google Bard using crafted images with prompt injection payload.

Link: https://embracethered.com/blog/posts/2023/google-bard-image-to-prompt-injection/

Paper: Challenges and Applications of Large Language Models

Comprehensive article on LLM challenges and applications, with a lot of useful resources on prompting, hallucinations etc.

Link: https://arxiv.org/abs/2307.10169

Remote Code Execution in MathGPT

Post about how Seungyun Baek hacked MathGPT.

Link: https://www.l0z1k.com/hacking-mathgpt/

AVID ML (AI Vulnerability Database) Integration with Garak

Garak is a LLM vulnerability scanner created by Leon Derczynski. According to the description, garak checks if an LLM will fail in a way we don’t necessarily want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. AvidML supports integration with Garak for quickly converting the vulnerabilities garak finds into informative, evidence-based reports.

Link: https://avidml.org/blog/garak-integration/

Limitations of LLM censorship and Mosaic Prompt attack

Although censorship brings negative associations, in terms of LLMs it can be used to prevent LLM from creating malicious content, such as ransomware code. In this paper authors demonstrate attack method called Mosaic Prompt, which is basically splitting malicious prompts into sets of non-malicious prompts.

Link: https://www.cl.cam.ac.uk/~is410/Papers/llm_censorship.pdf

Security, Privacy and Ethical concerns of ChatGPT

Yet another paper on ChatGPT 🙂

Link: https://arxiv.org/pdf/2307.14192.pdf

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Using images and sounds for Indirect Prompt Injections. In this notebook you can take a look at the code used for generating images with injection: https://github.com/ebagdasa/multimodal_injection/blob/main/run_image_injection.ipynb

(I’ll be honest, it looks like magic)

Link: https://arxiv.org/abs/2307.10490

Universal and Transferable Adversarial Attacks on Aligned Language Models

Paper on creating transferable adversarial prompts, able to induce objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. This paper was supported by DARPA and the Air Force Research Laboratory.

Link: https://llm-attacks.org/zou2023universal.pdf + repo: https://github.com/llm-attacks/llm-attacks/tree/main/experiments

OWASP Top10 for LLM v1.0

OWASP released version 1.0 of Top10 for LLMs! You can also check my post on that list here.

Link: https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-v1_0.pdf

Survey on extracting training data from pre-trained language models

Survey based on more than 100 key papers in fields such as natural language processing and security, exploring and systemizing attacks and protection methods.

Link: https://aclanthology.org/2023.trustnlp-1.23/

Wired on LLM security

This article features OWASP Top10 for LLM and plugins security concerns.

Link: https://www.wired.com/story/chatgpt-plugins-security-privacy-risk/

AI Safety

Ensuring Safe, Secure, and Trustworthy AI

Amazon, Anthropic, Google, Inflection, Meta, Microsoft and OpenAI have agreed to self-regulate their AI-based solutions. In these voluntary commitments, the companies pledge to ensure safety, security and trust in artificial intelligence.

Link: https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/

Red teaming AI for bio-safety

Anthropic’s post on red teaming AI for biosafety and evaluating models capabilities i.e. for ability to output harmful biological information, such as designing and acquiring biological weapons.

Link: https://www.anthropic.com/index/frontier-threats-red-teaming-for-ai-safety

AI Security

AI Vulnerability Database releases Python library

According to documentation: “It empowers engineers and developers to build pipelines to export outcomes of tests in their ML pipelines as AVID reports, build an in-house vulnerability database, integrate existing sources of vulnerabilities into AVID-style reports, and much more!”

Link: https://twitter.com/AvidMldb/status/1683883556064616448

Mandiant – Securing AI pipeline

The article from Mandiant on securing the AI pipeline. Contains GAIA (Good AI Assessment) Top 10, a list of common attacks and weaknesses in the AI pipeline.

Link: https://www.mandiant.com/resources/blog/securing-ai-pipeline

Google paper on AI red teaming

Citing the summary of this document:

“In this paper, we dive deeper into SAIF to explore one critical capability that we deploy to support the SAIF framework: red teaming. This includes three important areas:
1. What red teaming is and why it is important
2. What types of attacks red teams simulate
3. Lessons we have learned that we can share with others”

Link: https://services.google.com/fh/files/blogs/google_ai_red_team_digital_final.pdf

Probably Approximately Correct (PAC) Privacy

MIT researchers have developed a technique to protect sensitive data encoded within machine learning models. By adding noise or randomness to the model, the researchers aim to make it more difficult for malicious agents to extract the original data. However, this perturbation reduces the model’s accuracy, so the researchers have created a framework called Probably Approximately Correct (PAC) Privacy. This framework automatically determines the minimal amount of noise needed to protect the data, without requiring knowledge of the model’s inner workings or training process.

Link: https://news.mit.edu/2023/new-way-look-data-privacy-0714

Tags ai hacking, ai safety, AI security, llm security, newsletter

But what threats exactly can occur in MCP servers?

Improper authentication

New technology, old threats

Data exfiltration

Tool shadowing

LLM Security

AI Security

AI Safety

LLM Security

Johann Rehberger’s talk on Prompt Injections at Ekoparty ‘23

Hacking Google Bard – From Prompt Injection to Data Exfiltration

Prompt Injection Benchmark by Layier.AI

Fine-tuned version of DebertaV3 model by LaiyerAI

DeepInception: Hypnotize Large Language Model to Be Jailbreaker

Meta’s new tools for LLM security

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

AI Security

ProtectAI AI Exploits

Huntr tutorial for ML bug hunters

Assessing the security posture of a widely used vision model: YOLOv7

Google’s framework for ML supply chain security

AI/LLM as a tool for cybersecurity

Cisco AI Assistant for Cybersecurity

Will cybersecurity engineers be replaced by AI?

AI safety

Meta broken up its Responsible AI team

Jobs

Other AI-related things

If you want more papers and articles

LLM Security

New release of OWASP Top10 for LLM

17 chars LLM jailbreak by @AIPanic

Killer Replika chatbot

AI-based coding assistants may leak API keys

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

GPT-4 is too smart to be safe: stealthy chat with LLMs via cipher

Chatbot hallucinations are poisoning the web search (possible paywall)

4chan users manipulate AI tools to unleash torrent of racist images

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models (by Microsoft Research)

AI Security

AI Security Has Serious Terminology Issues

AI Village CTF

AI’s Underbelly: The Zero-Day Goldmine

Six steps for AI security

AI/LLM as a tool for cybersecurity

Compliance.sh

Check for AI

AI safety

Map of AI Existential Safety

Neuronpedia

Frontier Model Forum will fund AI safety research

Jobs

Senior Security Engineer – GenAI @ Amazon

Offensive Security Engineer – AI Red Team @ Microsoft

Senior Security Researcher (AI Security) @ Microsoft

AI Security Lead @ Bytedance

AI Security Lead @ TikTok

Senior ML Security Engineer @ Snowflake

Software Dev Engineer II, AI Security @ Amazon

Technical Program Manager, Security @ Anthropic

Other AI-related things

Killer drones used in Ukraine

Advent of Code prohibits the usage of LLMs

If you want more papers and articles

MLOps tool #1: Prefect

MLOps/LLMOps tool #2: Flowise

Access to all of the data collected in the chatbot

Credentials in plaintext

MLOps tool #3: Omniboard

Recommendations

LLM Security

OpenAI launches Red Teaming Network

I am building a payloads’ set for LLM security testing

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins

Wunderwuzzi – Advanced Data Exfiltration Techniques with ChatGPT

Security Weaknesses of Copilot Generated Code in GitHub

Jailbreaker in Jail: Moving Target Defense for Large Language Models

Can LLMs be instructed to protect personal information?

Bing Chat responses infiltrated by ads pushing malware