If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.
I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)
Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.
2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.
Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.
Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.
Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.
Deepmind seems to have achieved this already: Grandmaster-Level Chess Without Search
"Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms."
@TimothyJohnson5c16 I don't think this counts, since it isn't a general purpose LLM, it's a chess-specific transformer model.
The author says: "My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines)."
@TimothyJohnson5c16 Besides the obvious issue already mentioned, there is another:
> blitz Elo
Blitz chess is played with extremely tight time controls (5/3 or less on Lichess), giving humans little time to think. However, the problem description only makes a requirement about classical chess (70/30) Elo, and states "the model can write as much as it want to reason about the best move", which again is much closer to classical chess. Perhaps we need a clarification from the market creator @MP but I have been interpreting this market as about classical chess.
@TimothyJohnson5c16 Seems like an interesting article, though "We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points" is one hell of a bootstrap. The model is basically just a compression of Stockfish at that point. Imo this is less impressive than AlphaZero, which learns to play from scratch. Not sure this really counts as "without being created specifically [for chess]" as per the author's comment.
@Jasonb Yeah, it definitely doesn't satisfy the requirements for this market.
But I was still surprised to see that a model can compress Stockfish into only 270M parameters and still be extremely strong without any search.
@Quillist yes there is. The model needs to "beat" the human "at chess". A game where one or more players can make up rules during play is not "chess". And if you are playing chess, you've not "beaten" your opponent if you've broken the rules.
@Fion ☝️🤓 GothemChess is the most popular chess content producer - making him the ultimate authority on what the media thinks the rules of chess are, and in his tournaments AIs hallucinations are valid moves
@Wott I am not the market poster, but I think that would not be allowed. Writing and running a chess engine would imo be the same as accessing one.
Also, I personally doubt it could even do that successfully without heavily using external libraries, which I feel would go against the spirit of the market.
I think a better question to ask @MP is whether the model is allowed to run code at all.
@placebo_username Do you think it's possible to create a chess engine program whose logic can be "executed" completely in text? What might be the minimal length of such a chain of thought, for each move?
@AhronMaline @AhronMaline yes this is technically possible, you as a human could execute a chess algorithm on paper. My opinion is this would be terribly inefficient.
@KeithManning I mean sure, you could also simulate the Universe by arranging rocks in the desert... there's some order of magnitude where we can't call such things "possible".
@AhronMaline I agree, but you don’t have to write the whole move possibility tree for the game, that would be very impossible, traditional algorithmic chess engines don’t do this. You can instead use it to help you calculate some number of moves into the game, and then look at those board states and decide if they’re advantageous for you, even if you’re not calculating all the way to the end of the game.
This is how the core of chess engines work, and a human or LLM could theoretically do this with some paper (or chain of thought tokens) to write stuff down in, the problem is it’s orders of magnitude slower than chess engines that are designed just for this purpose and are tuned to modern computer hardware.
@KeithManning you'd need to take a chess engine, I guess a primitive one like Deep Blue, and manually carry out every logic step and FLOP. Every nested For loop. I doubt this could be done with less than... ten billion tokens per move? But I'm just spitballing.
We could make a market about this, and test by adding counters (or even Print statements) at each step in some codebase.
@AhronMaline nah it wouldn't be that complicated, the LLM would reason about it similarly to how a human does:
Use your intuition to think of a few good candidate moves
Apply each of those moves, visualise the resulting board state and see if they're advantageous to you
Do the same for your opponents turn on that visualised board
Repeat a couple times, then decide which of the original candidate moves is best
It wouldn't be running exact computer instructions like deep blue, but I agree it still won't be very efficient. I am betting very heavily NO on this market, so we are in agreement here, but I could see how LLMs could become semi-decent at chess if they were trained on it. The thing is I don't think they will be trained on it because it's not a good use of AI companies' times, because it will be orders of magnitude less efficient than actual chess engines.
@KeithManning That's a different story! Of course, that sort of "algorithm" is just how one reasons about chess. But it uses plenty of intuition that must be learned. That's not what "executing the code for a chess engine" should mean.
@AhronMaline I didn’t imagine we were talking about executing the low level logic of a chess engine, perhaps I misinterpreted the conversation.
Regardless, I would still call this process that I described an “algorithm”, it just uses natural language instead of machine code. The intuition would have to be learned yes, but less would be needed the more moves are calculated.
Until we can calculate the entire move tree (which we probably never will), intuition is needed for any chess engine, even traditional ones, modern chess engines implement “intuition” as a neural network that is trained to determine how good a board state is for the player, old school ones used a complicated heuristic function.
@KeithManning Okay then, I don't think we disagree. I understood @placebo_username to be suggesting that an LLM could win without the "intuition" that comes from specialized chess training, by writing down the code for an explicit, heuristic-based algorithm and then following it step by step. And I'm arguing that the sheer number of tokens involved in such an execution puts it completely out of reach.
In the unlikely case that an LLM does get really good at chess, it will be because training efficiency becomes good enough to learn excellent chess intuition just from non-specialized training on the data that's out there.
@AhronMaline I was indeed suggesting that even a low- or "medium"-level algorithm could be implemented, with overhead like a few tokens per bitwise operation or something. Obviously Deep Blue would be very expensive to implement in that fashion, but it's not optimized for that kind of efficiency anyways. Has anyone written a superhuman chess strategy that runs on a Raspberry Pi? How about just a board position evaluation heuristic, since doing the tree search part in naturalish language seems easier?
@placebo_username yeah, so I stand by what I said - that's much much too many tokens. But I'm just guessing. Would like to see a market about this.