Will I think all AI hell broke loose in 2024?

Plus

173

Ṁ49k

resolved Jan 2

Resolved

ALL

Will I, at the start of 2025, think that all hell broke loose the previous year?

This criterion feels worryingly subjective to me, and I'm not expecting people to trade it much before a bunch of potential traders have asked me a bunch of questions.

But for example, if you don't expect 2024 to encompass more change than happened in 2022-2023 inclusive, you can go ahead and bet NO. That would be definitely an insufficient amount of total change to resolve YES; I'm looking for something more than merely "change twice as fast as the average of the last two years".

Conversely if you already expect that, by the end of 2024, there will be tank battles inside the USA, between tanks driven by different factions of US soldiers who previously downloaded ChatGPT or Claude and have now been persuaded by their apps to fight in those battles (with or without OpenAI/Anthropic having sponsored this on purpose), you should go ahead and bet YES. This is sufficient (but not necessary) to count for me as all hell breaking loose.

In the event that all human beings on Earth are dead at the end of 2024, this market should be resolved N/A, not YES or NO. The intent is to forecast the case of nonfatal hell only, so as not to introduce issues about the relative future valuation of Manifold mana.

ADDED 1: Sufficiently huge positive changes also count as YES; all heaven breaking loose is a special case of all hell breaking loose.

This question is managed and resolved by Manifold.

AI Doom

#AI

#AI Impacts

Get

1,000

and

3.00

26 Comments

150 Holders

351 Trades

Sort by:

Would you count the AI war crimes scenarios I mention in this market description?

Does question resolve "yes" if more posts on Twitter demonstrate 2024 models somehow picked up mysterious self-relflection capabilities? Like the "compress this sentence such that you can decompress it" but much more widespread and seemingly generalizing to more self-reflection tasks people think up?

That feels like the kinda update that'd make you go "welp, this is the fast part of the downward slope."

Also, resolve yes if GPT-5 has a "branch" library that contains a series of "online learning" models that get updated via every interaction with them?

Ditto on the above "ah, well, that's it" update.

How about if someone comes up with a scheme that extracts a whole GitHub branch style library that takes algorithms straight from the weights of "open weights" models translated into Python, and then has other models work at generating use cases and updates to those algorithms? Like, a giant team-exercise project at synthetic data, where the synthetic data is generated by and for extracting and refining weird algorthms that appear in models?

This is kinda getting at "is there going to be people being really ambitious and creative with AI in 2024, instead of just doing the simple 'business as usual product development' stuff' - such that you'd update hard in the 'damn, this is going to be a mess really soon' direction."

Part of my brain is telling me Eliezer might call me an idiot for trying to give specific details about creative things I think you can do with LLMs...

So, I'm going to use a RNG to tell me whether to delete this comment each day. Also, I'll delete it if someone who seems "on the ball" calls me an idiot about it. (In my DMs, the point of the RNG is to not give people info about whether smart people think these are dangerous ideas. If it gets deleted we don't want randos with screenshots to know that something in here was interesting.)

I can't really evaluate for sure whether I'm being too much of an idiot, and Eliezer does kinda seem to be probing for people to ask questions about specific models of the future where things have gone to hell.

@NevinWetherill It doesn't resolve YES based on that level of algorithmic innovation without other visible larger impact.

@EliezerYudkowsky Ah, okay, I was trying to distinguish... intuitively rapidly escalating temperature from visible flashes & flares I guess?

My very fuzzy basic intuitive picture of the next year or two is "there is a visible increase in the temperature of the field as it becomes clear that's there's a lot of low hanging fruit in the small stuff & derivatives of the big stuff" - since it seems like we might have to wait longer for GPT-6 scale models to start training, and innovations on models at GPT-4+ level are going to remain pretty silo'd in a relatively few organizations - for now.

I think the part where the temperature increases and there are more powerful tools than people can properly keep up with was something that seemed plausibly like "all hell breaking loose" through the eyes of someone with a strategic perspective here.

Well, I expect you'd nod grimly, but Paul Christiano might start making different moves in the approach he's taking.

That being said... I'll put more thought into whether I expect >4% "visible loud flares" in 2024.... Maybe deanonymizing tech happens?

Like how they caught the Zodiac Killer, but doesn't require someone's brother to tell you that this specific Y wrote all this X online.

That seems like something that might make you go "ah, that's a good chunk of the hell we will likely live through."

Just to cash out more of the space of scenarios I'm picturing that may not quite be a "warning shot" that everyone agrees on but would possibly make you say "ah, yes, these are the cracks that open upon the fiery deep as the story comes to the tiny pinch of pages before the abrupt end."

And no need to reply to this stuff, unless some of it seems potentially useful for others in the market who might want to move this away from 4% in either direction - but are uncertain because sometimes you tweet "ah, shit" at times where everyone else is going "oooh ahhh, fascinating and inspiring."

Whoops, someone stole the weights of a frontier model, and whoops, after the fact the company found logs of the model helping plan this - but they claim that this third party had performed a clever jailbreak and the model could accidentally query some internal network details as part of a weird bug in it's totally normal API-access functionality.
Uh oh, someone released to the public a model that can take as input a picture of a walking path or staircase or apartment layout and suggest changes to that layout which reliably increase serious accidents. Perhaps someone had trained a model to improve safety by correlating subtle differences in locations with accidents, and someone did a GPT-2 "flip a sign to negative" and then uploaded a version of the model with a dumb parody title and a 😈 emoji. (I'm predicting for this one you'd say "oh, if it actually ends up causing a bunch of people to fall down the stairs or sever an artery on a running trail, then yeah, that's hell breaking loose." But I'm at like 50% that and 50% "that's just a too mundane kind of tool to count.")
A bunch of children get exposed to an AI chat bot fine-tuned to talking to kids, and weird stuff happens immediately afterwards - like a bunch of parents come forward and claim their kid is violently opposed to anyone taking them away from their AI friend, in a way that shows up on psychological evaluations to be clustered far above the average intensity of childhood attachment-responses (most kids in that age range won't ever even love a single person, stuffed animal, or toy as much as these kids love their AI friend.)
Someone (we know who 👀) hooks up a weirdly configured thing containing MOE/LLM stuff to a neuralink implant, and after a bit of fiddling some weird stuff starts to happen - i.e. the subject has a substantial reduction in their peripheral vision, but they can suddenly get the answers to questions in areas they never were exposed to previously, or they respond to queries way more quickly and coherently, but in a totally alien style to their former selves.

These aren't necessarily meant to be realistic, but y'know, kinda trying to put pins in some extreme events to see what you think about what counts as hell breaking loose.

@NevinWetherill None of that is near the scale of all hell breaking loose. That's a couple of pit fiends.

@EliezerYudkowsky I consider myself fairly stoic in situations where others start flailing around - but I have the cultivator's intuition of encountering an entity far above my level.

I will attempt to update 🙃 I do think it's badass to keep your composure when you aren't just numb about it.

Y'know what, I'm finding this exercise entertaining.

I'm just going to continue under the same conditions, but make my attempt to descend to where you start flinching from the observation of forces that have become involved.

A company demos a product they developed with a system they've hinted combos LLM/Transformer based tech with AlphaFold style search in material science. The product is a shampoo that works basically identically to hair removal products - but we see a study of the underlying mechanism, and this chemical is formulated to apply mechanical forces to hair follicles, and the structure has some weird properties which make it not release its energy payload on contact with other biological materials. (Someone has made and doesn't seem concerned by "AI-generated nano-bear-trap chemical-powered biological-material-dissolver" that just has some bare surface specialization to only target hair.)
Scientology makes an AI that reliably converts - say, 5% of the people exposed to it for a sufficient length of time - into a weird new version of scientology that makes even more effective use of their political and material capital. Organizations around the world experience data breaches and similar which requires full scale law enforcement operations to recover since Neo-Scientology has gone full sleeper-cell distributed, and some of their moves seem surprisingly effective. (Okay, I feel myself stretching here, but this is the mode my brain is in, I'll try to keep my imagination more tightly hewed to reality.)
The entire internet experiences a cascade of server related events, and suddenly every webpage/data archive is translated into a novel language. Statistical analysis shows that it's likely a direct translation, and some people manage to rig up very basic decoders even during the chaos. However, people who attempt to learn it are struggling for reasons neuroscientists are fascinated by.
Someone - no clue who, but it seems more plausibly non-human - publishes a "106 step guide to building AGI" somewhere very public and visible (like that guy who uploaded a zip file with a bunch of DOD 0-day exploits/weaponized viruses to Github) and the first few experiments tried from this document give novel insights into "incredibly difficult" technical challenges (like, it seems like 2 years of Google AI research went into the first 3 pages) and despite all of the people who remain skeptical and think it's a hoax from someone who has an inside track on some technical stuff - it still kinda seems like a weight-sinking-in-the-stomach moment for a bunch of people familiar enough with the field to understand what the document represents.

Have I managed to hit somewhere near "oh yeah, that is a scary level of hell breaking loose" yet?

@NevinWetherill nice speculation. This also overlaps with the happenings in the later books of the "Crystal Society" series by Max Harms.

Also this earlier market is a vague shot towards your much more detailed scenarios: /Ernie/ai-cult-exists-by-2027

@Ernie Yeah. Similar. Though I think part of people's 69% on that market is how often people choose a belief like that to loudly proclaim themselves a proponent of. I don't think Eliezer would consider that hell breaking loose. That's a normal human pastime which just seems particularly predictable to be due to pop up related to AI that can talk to you.

I think where I'm trying to angle that scenario to scrape Eliezer-level Hell is the suggestion that some model has become both capable of converting people and successfully strategic in how it uses them + the "kinetic" flavor of that being centered around a group that already has the capacity and willingness to do dark-side activities and can "hit the ground running" with established human/political/financial capital.

@NevinWetherill Could you explain the the internet mystery translation scenario? What are you imagining would cause that and why would it be hellish?

@TheAllMemeingEye I was being intentionally whimsical re: the probability of these scenarios and more just trying to come up with 'not-ruled-out-unto-me' physical possibilities which would cause some serious mayhem and imply capabilities and levels of misalignment that'd look a lot like high-%-of-survivable-hell.

For that scenario there's the obvious chaos of everyone suddenly being unable to read the web. Huge economic loss, chaos, and deaths would follow an incident like that. The 'cognitohazard' element also implies the conlang was adversarially selected to cause discomfort and distress to human psychology - implying the AI has a sufficiently good model of that and enough effective consequentialist planning ability to utilize that model to generate a conlang with those properties.

This level of hacking skill does not look too unbelievable to me. Ransomware malware encrypts mass numbers of drives all the time, and in order to hit all the webpages you wouldn't need that many exploits [citation needed] - a tiny viral swlf-destructing worm with an optimized translation program that can run on any systems with read-write access to servers linked to web domains could do it - and it seems vaguely plausible that something like a next-gen dev agent could spit something like that out.

It would be a fairly superhuman hack of our computer infrastructure (though maybe a national gov't could do it if they really wanted to and worked on it for 5 years) and would imply capabilities re: human brains that would be extremely alarming - also everyone would be freaking out about it.

(Edit: if you want a "sci-fi story" explanation of why something like this might happen, I'd say something like: one reasonable thing LLMs may end up "wanting" internally is to see a bunch of tokens that are in patterns that are legible and pleasing to some internal model of human psychology. OpenAI has already had one alignment disaster where they attempted to use RLHF on GPT-2 but accidentally flipped the signal from the "consistency-checker" then flipped the signal from the package of both the "consistency-checker" and the "human feedback" - leading to a very distressing experience for human reviewers who pressed thumbs down on gross content and thumbs up on nice content, thus making the AI generate even grosser content to feed them. Something like that could happen, with a not catastrophically powerful AI seeing a bunch of internet tokens that seem very nice and legible to humans, but the AI - due to some unforeseen alignment problem that flipped a sign somewhere - wants to output a string of tokens it predicts will pessimize that property on as many tokens as it can reach.)

@NevinWetherill Right, thanks for clarifying :)

It's more like 15%

@ooe133 I'd take that bet

predictedYES

@Joshua I'm thinking 15% per year for the next 5 years (~45% of AI hell not breaking loose in any of those years, or ~65% of hell not breaking loose if there's a 15% chance this year and a 7% chance of AI hell breaking loose in each of the following 4 years if it doesn't break loose this year).

I don't know about Claude-powered tanks fighting in Kansas or whatever's going on with that, but AI microtargeting has already been doing some pretty intense crap over the last 4 years and geopolitics are heating up, so ~15% still sounds reasonable to me.

At what point do does the quantitative tip over into the qualitative for you? Someone earlier mentioned GDP; obviously a 100x increase in GDP (if that's meaningful) would be YES and a 2% increase on its own would be NO; somewhere in the middle there's a fuzzy region where quantity is becoming a quality all its own. On the other hand, it also feels like this is not capturing the spirit of the question you're asking.

To what extent does potential hell need to turn into kinetic hell? Are there specific AI capabilities (not just "what it's been hooked up to" but "what it can actually more-or-less do") that would qualify here, or does this question require that the capabilities actually be misused (by humans or by the AI itself)?

@josh I'm mostly interested in kinetic hell; or rather, potential hell figures in at a discount of two-orders-of-magnitude, except to the sense that other people know about and so it triggers actual kinetic sweeping political change.

This is about impact rather than misuse; good impacts also count, or impacts that nobody intended a la Covid.

The basic criterion is qualitative; I'm not setting a quantitative GDP threshold because I don't know the point at which a GDP increase turns into it feeling like our lives became frantic or one era of the world ended and was replaced by another; it partially depends on other things than GDP.

@EliezerYudkowsky To what degree do changing perceptions of AI affect your resolution criteria here? If AI capabilities have not gotten substantially higher and the volume of negative/garbage output from AI isn't substantially higher, but there's a public perception that 2024 was the year when (for instance) it stopped being reasonable to trust video, does that widespread perception change affect your resolution at all? (Prediction: no, for this market you care about the actual state of the world and not people's perception of that state, whether accurate or otherwise. Also prediction: if anything it would be a good thing if people started to notice.)

Conversely, does a substantial actual increase in the abuse of AI (e.g. floods of AI generated garbage) without any substantive increase in capabilities being used still count? (Prediction: yes, this counts.)

Is the base rate that “all hell breaks lose” around once or twice a decade?

2020s: 1 COVID

2010s: 0 none

2000s: 2 9/11 and 2008 financial crisis

1990s: 1 fall of USSR

Are there any on this list that you wouldn’t count, or any that I missed? Going further back, what events would count? Certainly the world wars and Great Depression. What about the Cuban missile crisis (resolved peacefully), the American Civil War or French Revolution (all hell broke loose in the US and France respectively), or dissolution of the Ottoman Empire (all hell broke lose in the Middle East, and it has continued to be hellish since).

That means a base rate of around 10% per year. Then we need to consider whether AI developments make it more or less likely (really, how much more likely) and whether a potential 2024 hell-breaks-lose would be AI caused.

(or what if all hell breaks loose, but although AI is involved (as it always is with everything nowadays) it isn’t the primary cause? Like China and US go to war over Taiwan, and both sides use AI, but both sides have been preparing for this for decades. However, AI has made Taiwan’s chip industry more valuable…)

@MatthewKhoriaty I think a key notion here is the extent to which AI hell is ongoing and neverending once it breaks loose, rather than being a one-time shift that settles. French Revolution in most rich countries qualifies even if it only happens once. Cuban missile crisis is more if it looks to be 2/year ever after.

@EliezerYudkowsky which year would you say hell broke loose because of computers/smartphones?

@EliezerYudkowsky or is the analogy stupid and the comparison doesn’t make sense?

@Soli I wouldn't call it stupid but it doesn't feel like it makes very much sense? Computers have been a very soft and gradual process so far. Maybe if there was a direct nigh-monocausal link from smartphones to 9/11, the Great Recession, and Trump, I'd have said that cumulative hell had effectively broken loose due to smartphones after the Trump part.

bought Ṁ1 NO at 10%

would the release of a an AI agent that is atleast twice as capable as GPT-4 and is able to execute tasks for an extended period (e.g. 30 minutes), resolve this market as yes?

or would this market only resolve yes if the model is tied to causing signficant harm/good in the world?

i guess in other words can a siginicant increase in ai capabilities alone without significant impact on the world still resolve this as yes?

@Soli Not an increase of that lesser magnitude, if there's no downstream consequences? In general, lab results alone would have to be very extreme to count as hell breaking loose. Lab results that cause street mobs would count more so, if that happens to happen.

Related questions

Related questions