At the time of market creation, the community prediction on Metaculus for the "Date Weakly General AI is Publicly Known" is Oct 18, 2025: https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/
The market will resolve YES if, in retrospect by the end of 2025, this has indeed been achieved (and is agreed upon by industry experts); the market will resolve NO if not.
@block blast While it’s true that many LLMs may contain SAT Math questions in their training datasets, the focus of their training is on understanding patterns and generating coherent responses rather than strictly adhering to any particular educational standard. Excluding certain types of questions for the sake of compliance with Metaculus criteria may not be a priority for developers, especially if those questions contribute to the model's overall performance and versatility in math-related tasks.
@ahalekelly The author clarified below that they are going to rely on expert consensus not Metaculus strict criteria
If you're 100% sure that we should have, "weak AGI" by 2025, then you should be able to have a clear answer for how many watts that will take and should have no problem betting on this market. If you can't answer this question, then you are basically just guessing and have no certainty that it will occur by 2025 or if at all.
@MartinRandall Sorry, I'm unclear what you mean. "X% confidence," sounds like, "X confidence level," a statistical term. Did you mean confidence level? Whereas, "100% sure," is vernacular which could mean, "I as an individual am filling an order at 100% on YES at this time," or, "I am very sure of this particular belief, and hold no numerical measurement of this belief in any dimension," e.g. I just believe it.
@PatrickDelaney It'd be very strange to quantify 100% on a forecasting site and not mean a quantity.
@xxx As someone who works in NLP and lives with a roboticist, this doesn't work for many reasons. Chief among them is that the network doesn't know what does and doesn't work in a robot, so it will have no understanding of why something failed, and therefore gains from reflection will be minimal. There's also a broader issue of using language models as controllers for continuous high dimensional tasks, where even very slight imprecision leads to wildly incorrect answers. This is in contrast to something like standard language tasks where there are many potential correct answers with a lot of fuzziness around each one.
@NoaNabeshima yes, i will resolve this market when it closes on Jan 1, 2026. looking back, if i feel like the resolution criterias are met, i will resolve the market to YES, otherwise i will resolve the market to NO.
@VictorLi Suppose that noone actually tries the Loebner silver turing test and there is some disagreement between industry experts about if it would be passed if tried, but you think it would be passed. Could this resolve yes?
Suppose some industry experts think it would be passed and you think it would be passed, but others aren't sure. Could this resolve yes?
@VictorLi Could this resolve yes if industry experts think it would be passed but it's not been attempted?
@NoaNabeshima if i believe it qualifies as "weak AGI" and a majority of industry experts concur, then i will resolve YES, otherwise it will resolve NO.
granted, "the majority of industry experts" is a vague measure, but i think i will abide with common sense on whether or not there is consensus. fwiw i basically expect it to be weak AGI, but im betting NO because i doubt the industry experts wil recognise it as such.
@VictorLi So if it definitely doesn't solve Montezuma's revenge but expert consensus is that it's "weak agi" does this resolve No or Yes?
This requires passing a pretty intense Turing test, and the more I play around with GPT-4 the more I think people will be able to very easily poke holes in these things for a long time to come. I give a 20-30% chance that AI systems in existence at the end of 2025 will be able to pass a Turing test with that level of adversariality, and that's being very generous and giving a very wide tail to unprecedently fast exponential improvement.