Will DeepSeek’s next premier model be released as open source?
Resolves when the model is released.
Update 2025-05-05 (PST) (AI summary of creator comment): For the purposes of this market, open source is defined as open weight. The release of the training dataset is not required.
Update 2025-08-22 (PST) (AI summary of creator comment): - The premier model is defined as the release that:
Is the most core model to DeepSeek's business at the time of release; and
Has a major version number bump or a new name indicating more than an incremental step.
@CampbellHutcheson Does v3.1 (thinking) count? It's plausible that there won't be more standalone reasoning models
@Bayesian I think I might wait a couple of months to resolve this, given that some folks expect that DeepSeek will release a next generation model at some point this month or next month.
I think the question is whether v3.1 (thinking) counts as the "next premier model".
yeah i agree that's the question. ig if in 2 months they come up with r2 and it's not open source, i don't see how that would indicate that v3.1 (thinking) was not a "premier model" for the purpose of this market. So could you clarify what you aim to learn by waiting for a next generation model, wrt how this market will resolve?
@Bayesian I think a model is the "next premier model" if it is: (1) the most core model to their business at the time of release and (2) this is represented by a major version number bump or new name representing that they view it as more than an incremental step.
@CampbellHutcheson I see that makes sense. Then from (2), 3.1 (thinking) is not a new premier model. ty!
@CampbellHutcheson There's an ongoing debate on whether it is correct to call open-weight models "open-source" if they don't include (or even disclose what is in) the dataset—the source, so to say—of the data stored in the weights. Since you cannot reproduce the model with what is made publicly available, "open-source" is a misnomer, so scientific (and adjacent) literature avoids using the term for models with private training datasets.
Given that prediction markets are usually strict on terminology—as they should be—I believe a clarification was warranted with this question, so thanks for that!
@moozooh yeah, I don't think much of the debate - it's basically just a question of whether something is pure enough to be considered open source - all the open models that anyone cares about in terms of actual usage (Llama, DeepSeek, etc...) are open weight but don't provide their training data.
I do understand that this is a popular debate in certain circles though.