Finished training.
it should be competitive with SOTA of the time. Rough estimates ofc but not too far behind.
I will not trade in this market.
@mods
State of the Art (SOTA) in AI and machine learning refers to the best-performing models or techniques for a particular task at a given time, typically measured by standard benchmarks or evaluations:
- Paper with specific details on BaGuaLu https://dl.acm.org/doi/10.1145/3503221.3508417
Other resources pointing in the direction with this model trained with above 10 trillion parameters:
https://www.nextbigfuture.com/2023/01/ai-model-trained-with-174-trillion-parameters.html
Kind regards
@END13 from what I can tell, there aren't any independent verifications of this claim - have you seen any?
as the commenter I'm replying to seems inactive for ~1mo, pinging @traders for any evidence to support a Yes resolution.
@shankypanky didn't we hear that gpt 4.5 used 10x more compute than any previous model? As far as I know they didn't release a parameter count but they made it clear it was at least several trillion, and I don't think we can rule out 10 trillion.
@ErickBall perhaps, but I'll need tangible evidence to resolve Yes based on this. can you include some links to support the argument?
@shankypanky see for example this page that claims it's 12.8 trillion: https://docsbot.ai/models/gpt-4-5 but I don't know how reliable that number is.
In slightly more believable sources, MIT tech review says "OpenAI won’t say exactly how big its new model is. But it claims the jump in scale from GPT-4o to GPT-4.5 is the same as the jump from GPT-3.5 to GPT-4o. Experts have estimated that GPT-4 could have as many as 1.8 trillion parameters" but of course we don't know exactly how many GPT-3.5 is either. Rumor says the original GPT-3.5 (not turbo) had 175B parameters, same as GPT-3, and frankly it would be a little weird if it were smaller than that. So that implies a ~10x jump in size from GPT-4 to GPT-4.5. Since that's an order of magnitude it isn't solid evidence that they're over 10T, but it's suggestive. I think I'd give it 50/50 based on what we know right now.
@ErickBall (I'm biased but) I wouldn't trust that docsbot site at all. I think blind OOM guesses suggest it's within the realm of possibility but I'd also guess that OpenAI is also working on efficiency which would reduce the number of parameters needed for a generation jump.
@wasabipesto well yeah, all of their production models are pretty small, for cheap inference. But GPT-4.5 is a weird one in that they trained it to use for distillation, probably not even intending to release it. So the incentive to overtrain isn't really there, and we'd expect it to be approximately chinchilla optimal (~20 training tokens per parameter, if memory serves). Then 10T params would imply 200T tokens, which does sound high but then, meh, that's the whole thing with scaling laws, right? Synthetic data and whatnot. Also the optimal token count probably changes for an MoE model, and the very high inference cost supports a high estimate of parameter count. The API price is higher than the original GPT-4 despite being two years later. If we figure hardware compute cost is cut in half in that time and algorithmic advances reduce the compute needed by half again, we'd be around 8 trillion. I asked Claude for an estimate of the parameter count and it said 5-12 trillion.
@ErickBall also llama 4 behemoth is 4.6T total parameters and apparently trained on only 30T tokens. So clearly OpenAI wouldn't have needed 200 trillion.
The evaluation shows that BaGuaLu can train 14.5-trillion-parameter models with a performance of over 1 EFLOPS using mixed-precision and has the capability to train 174-trillion-parameter models, which rivals the number of synapses in a human brain.
https://dl.acm.org/doi/10.1145/3503221.3508417
https://keg.cs.tsinghua.edu.cn/jietang/publications/PPOPP22-Ma%20et%20al.-BaGuaLu%20Targeting%20Brain%20Scale%20Pretrained%20Models%20w.pdf
https://www.nextbigfuture.com/2023/01/ai-model-trained-with-174-trillion-parameters.html
Y Combinator using this as a joke headline in November. https://www.youtube.com/watch?v=lbJilIQhHko
@Mira Facebook one year later:
Llama 4 Maverick, a 17 billion active parameter model with 128 experts...
@BarrDetwix yes you can :) it'll be a tad bit costly though. And I think i should ammend the description to say that it should be competitive with SOTA of the time
Apologies for the title change, I will compensate if you want me to. I've changed title to AI model instead of LLM because we soon will have VLMs (vision language models) become popular, etc.