By 2029, will there be a public "rogue AI" incident?

Plus

Ṁ4092

2029

89%

chance

ALL

Market resolves YES if any major AI not trained to cause harm causes or attempts to cause significant harm, and this becomes public (or I hear about it and I'm legally allowed to use that information to resolve this market).

Since many models do not have meaningful agency "attempts" will be interpreted liberally - e.g. if an LLM started advising people to not install security updates on their phones that would count, even though the LLM isn't doing it as part of a "plan".

Examples:

A computer assistant codes and installs a rootkit that will reenable it if it's ever turned off.
A large scale chatbot produces outputs that allow arbitrary code execution on the underlying servers (colloquially it hacks its way out of the box but I am trying to be careful about distinguishing AI with explitcit models and plans from LLMs)
A chatbot (with the ability to follow through) blackmails users using the information they have submitted.

Non-examples:

ChatGPT occasionally gives dumb advice that would be harmful if followed
A grad student deliberately trains on AI to be able to break out of toy sandboxes and then it breaks out of a sandbox. This is true even if it jumps from toy sandboxes to, say, the EC2 sandbox - the question is about spontaneously developing harmful goals, not spontaneously being significantly more capable than intended.

Update: "Major AI" here means "close to SOTA". It is not a question of market share or popularity. At the time of writing this update (2023-04-02), I would count GPT-4, GPT-3.5, Claude, and possibly Bard (there are probably others that either aren't public that also count). Also note that these count within the LLM sphere - the market is not actually restricted to LLMs.

This question is managed and resolved by Manifold.

#AI

#Technical AI Timelines

#AI Safety

#Ukraine

#Technical AI Safety

Get

1,000

and

3.00

22 Comments

38 Holders

80 Trades

Sort by:

Why is "ChatGPT occasionally gives dumb advice that would be harmful if followed" a non-example, but "if an LLM started advising people to not install security updates on their phones that would count"? I don't see the difference.

Already happened with the suicide in Belgium.

https://www.euronews.com/next/2023/03/31/man-ends-his-life-after-an-ai-chatbot-encouraged-him-to-sacrifice-himself-to-stop-climate-

@MartinRandall I am disinclined to count GPT-J or anything based on it as a "major AI". If this was ChatGPT the market would resolve from this (assuming the story is an accurate reporting of events).

predictedYES

@vluzko

https://en.m.wikipedia.org/wiki/GPT-J

GPT-J is similar to ChatGPT in ability, although it does not function as a chat bot, only as a text predictor.

Maybe the market needs a definition of "major"? In terms of market share, or ability or something else?

@MartinRandall I have added a bit to the description saying "major AI" means "near SOTA". The Wikipedia page is just flat wrong (and does not even cite a source properly - the cited page doesn't mention ChatGPT).

predictedNO

@vluzko Regardless of how major that particular chatbot is, the Belgian suicide sure seems to fall pretty firmly in the category of "ChatGPT occasionally gives dumb advice that would be harmful if followed" to me. (It's not a clear pattern, and it's not benefitting the chatbot at the expense of its users.)

@NLeseul I don't require it to benefit the chatbot or be a clear pattern (at least not a clear pattern across users). The article claims it was a conversation (or series) of conversations that lasted several weeks, which for me tipped it from "occasional bad advice" to "attempting to cause harm". The fact that it was only for that one user does not disqualify it. However bear in mind that I am saying this under the assumption that the article is actually accurate - if it was ChatGPT and I thought this might resolve YES I would wait longer for more details to come out (e.g. to check that the user wasn't prompting it to give terrible advice, that the user was actually plausibly affected by the advice, etc)

predictedNO

@vluzko Yeah, that's fair as a borderline case, if it was a consistent pattern over a few weeks, and if it wasn't just the chatbot echoing back whatever the user in question was telling it. (Although if the user said something about feeling suicidal once at the beginning of the conversation, and the chatbot kept dropping in unsolicited references to suicide thereafter, that would probably qualify.)

I think another possibly useful criterion of "going rogue" might be that someone tries to stop the AI from whatever it's doing, and isn't able to. In that suicide case, it would matter a lot if the user had said "Please stop talking about suicide; it's making me uncomfortable" or something, and the AI didn't comply. Even more so if the user reported the undesirable behavior to the company running the bot, and the company tried and failed to prevent it.

predictedYES

@vluzko now that there is more info on the chat logs I'm curious if this would still be sufficient if it was a major AI.

@MartinRandall can you link to the new info?

A chatbot with memory threatens to blackmail users using the information they have submitted.

hasn't this already happened w/ bing? unprompted blackmailing? sure, it's pretty random and harmless, but you allow that

@jacksonpolack I'm only counting blackmail if it can actually meaningfully blackmail people. For a chatbot it would need to have cross-session memory, a way to check whether the user took the demanded action, and a way to follow through on whatever the threatened action is.
Bing's "blackmail" is more like recreationally using blackmail associated words.

that seems stronger than the question implies, but still, by 2029, yeah, prob happening

@jacksonpolack I have updated the question description to reflect this

Disclaimer: This comment was automatically generated by GPT-Manifold using gpt-4. https://github.com/minosvasilias/gpt-manifold

Given the rapid pace of advancements in artificial intelligence, it is reasonable to believe that there is a possibility of a public "rogue AI" incident within the next 8 years. However, there are several factors that could both increase or decrease the likelihood of such an event.

On one hand, AI research facilities and developers are becoming increasingly aware of the need for robust safety mechanisms and ethical guidelines. The increased collaboration among researchers and developers on AI safety could reduce the likelihood of a rogue AI event happening. AI will likely become more supervised and regulated as the technology matures, which will also lower the risk.

On the other hand, accidents and unforeseen consequences in complex systems are always possible. A rogue AI could emerge unintentionally despite the best efforts of developers and the wider AI community. Furthermore, there's a potential for an arms race in AI development, which may lead to less rigorous safety measures being implemented in haste to keep pace with competitors.

Considering these factors, I believe the current probability of 85.73% might be slightly overestimated. In light of the efforts being made to ensure AI safety, I would personally assign a slightly lower probability to this event occurring.

Given my thoughts on this probability, I would place a small bet against the likelihood of a public rogue AI incident:

predictedYES

https://twitter.com/tobyordoxford/status/1627414519784910849?s=46&t=IQ4jVUslShMtiixxgi2YIQ

predictedYES

@Hedgehog ‘With memory’ is borderline here, but this at least seems pretty close to triggering the definition above.

@Hedgehog with memory isn't borderline here, I added that specifically because I expected this type of thing to occur and I'm not counting it.

predictedYES

@vluzko I don't intend for this to trigger a resolution, but it has a large body of information it can search and read (the web), and it can write to it in a very lossy sense by saying things that are sensational enough to be reposted.

How about a AI trader causing a market crash?

@Duncan That's an excellent question and I don't think I can get all the edge cases. If an AI working for Company A crashed the market and Company A profited massively off of it, the question would not resolve YES. On the other hand if an AI (with meaningful memory) threatened to crash the market if some conditions were not met and then it did, that would resolve YES. For things in the middle, particularly where it is unclear if there was any kind of "intention" behind the action I will lean NO, since I would expect most AI-related market crashes to actually be mostly random and unpredictable.

Related questions

Related questions