By March 14, 2025, will there be an AI model with over 10 trillion parameters?

@mods

State of the Art (SOTA) in AI and machine learning refers to the best-performing models or techniques for a particular task at a given time, typically measured by standard benchmarks or evaluations:

- Paper with specific details on BaGuaLu https://dl.acm.org/doi/10.1145/3503221.3508417

Other resources pointing in the direction with this model trained with above 10 trillion parameters:

https://keg.cs.tsinghua.edu.cn/jietang/publications/PPOPP22-Ma%20et%20al.-BaGuaLu%20Targeting%20Brain%20Scale%20Pretrained%20Models%20w.pdf

https://www.nextbigfuture.com/2023/01/ai-model-trained-with-174-trillion-parameters.html

Kind regards

@END13 from what I can tell, there aren't any independent verifications of this claim - have you seen any?

as the commenter I'm replying to seems inactive for ~1mo, pinging @traders for any evidence to support a Yes resolution.

@shankypanky didn't we hear that gpt 4.5 used 10x more compute than any previous model? As far as I know they didn't release a parameter count but they made it clear it was at least several trillion, and I don't think we can rule out 10 trillion.

@ErickBall perhaps, but I'll need tangible evidence to resolve Yes based on this. can you include some links to support the argument?

@shankypanky see for example this page that claims it's 12.8 trillion: https://docsbot.ai/models/gpt-4-5 but I don't know how reliable that number is.

In slightly more believable sources, MIT tech review says "OpenAI won’t say exactly how big its new model is. But it claims the jump in scale from GPT-4o to GPT-4.5 is the same as the jump from GPT-3.5 to GPT-4o. Experts have estimated that GPT-4 could have as many as 1.8 trillion parameters" but of course we don't know exactly how many GPT-3.5 is either. Rumor says the original GPT-3.5 (not turbo) had 175B parameters, same as GPT-3, and frankly it would be a little weird if it were smaller than that. So that implies a ~10x jump in size from GPT-4 to GPT-4.5. Since that's an order of magnitude it isn't solid evidence that they're over 10T, but it's suggestive. I think I'd give it 50/50 based on what we know right now.

OpenAI's GPT-4.5 - AI Model Details

The latest GPT-4.5 model from OpenAI, launched on February 27, 2025, is a massive 12.8 trillion-parameter AI with a 128K token context window. It offers enhanced general knowledge, improved emotional intelligence, multimodal input (text and image), advanced function calling, and streaming responses. Initially available via ChatGPT Pro and later to Plus and Team users, it generates responses at around 37 tokens per second, making it ideal for tasks that require cutting-edge emotional and general intelligence.

@ErickBall (I'm biased but) I wouldn't trust that docsbot site at all. I think blind OOM guesses suggest it's within the realm of possibility but I'd also guess that OpenAI is also working on efficiency which would reduce the number of parameters needed for a generation jump.

@wasabipesto well yeah, all of their production models are pretty small, for cheap inference. But GPT-4.5 is a weird one in that they trained it to use for distillation, probably not even intending to release it. So the incentive to overtrain isn't really there, and we'd expect it to be approximately chinchilla optimal (~20 training tokens per parameter, if memory serves). Then 10T params would imply 200T tokens, which does sound high but then, meh, that's the whole thing with scaling laws, right? Synthetic data and whatnot. Also the optimal token count probably changes for an MoE model, and the very high inference cost supports a high estimate of parameter count. The API price is higher than the original GPT-4 despite being two years later. If we figure hardware compute cost is cut in half in that time and algorithmic advances reduce the compute needed by half again, we'd be around 8 trillion. I asked Claude for an estimate of the parameter count and it said 5-12 trillion.

@ErickBall also llama 4 behemoth is 4.6T total parameters and apparently trained on only 30T tokens. So clearly OpenAI wouldn't have needed 200 trillion.

The evaluation shows that BaGuaLu can train 14.5-trillion-parameter models with a performance of over 1 EFLOPS using mixed-precision and has the capability to train 174-trillion-parameter models, which rivals the number of synapses in a human brain.

https://dl.acm.org/doi/10.1145/3503221.3508417

https://keg.cs.tsinghua.edu.cn/jietang/publications/PPOPP22-Ma%20et%20al.-BaGuaLu%20Targeting%20Brain%20Scale%20Pretrained%20Models%20w.pdf

https://www.nextbigfuture.com/2023/01/ai-model-trained-with-174-trillion-parameters.html

bought Ṁ250 NO

Y Combinator using this as a joke headline in November. https://www.youtube.com/watch?v=lbJilIQhHko

If I take llama and "mixture of experts"" it a thousand times, then the resulting system has a lot of parameters, is state of the art(if not efficient in parameter use), and isn't THAT costly to run 🤔

@Mira do it

@Mira Facebook one year later:

Llama 4 Maverick, a 17 billion active parameter model with 128 experts...

Does any open source model resolves to YES, or only entreprise models?

... can I train a 10 trillion MNIST classifier? :)

@BarrDetwix yes you can :) it'll be a tad bit costly though. And I think i should ammend the description to say that it should be competitive with SOTA of the time

Apologies for the title change, I will compensate if you want me to. I've changed title to AI model instead of LLM because we soon will have VLMs (vision language models) become popular, etc.

Related questions

Related questions