Will OpenAI release a technical report on a model designed for AI alignment research? (2024)
Will OpenAI release a technical report on a model designed for AI alignment research? (2024)
➕
Plus
75
Ṁ44k
Dec 31
8%
chance

This market predicts whether OpenAI will release a technical report on a language model specifically designed for AI alignment research, with a focus on interpretability benchmarks, by December 31, 2024.

Resolves YES if:

  • OpenAI publishes a technical report on or before January 1, 2025, detailing a model developed with the primary purpose of AI alignment research. The report must include benchmarks evaluating the model's interpretability.

Resolves PROB if:

  • There is significant controversy or disagreement over whether the released report meets the criteria for AI alignment research and interpretability benchmarks.

Resolves NO if:

  • OpenAI does not publish a technical report meeting the above criteria by January 1, 2025.

Definitions:

  • A language model is an algorithm that processes and generates human language by assigning probabilities to sequences of tokens (words, characters, or subword units) based on learned patterns from training data. They can then be used for various natural language processing tasks, such as text prediction, text generation, machine translation, sentiment analysis, and more. Language models use statistical or machine learning methods, including deep learning techniques like recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and transformer architectures, to capture the complex relationships between words and phrases in a language. Model must not have been released before this market's creation. While performance is not the primary goal, it must be competitive on benchmarks with language models at most 2 years behind it (this excludes the possibility of e.g. a Markov Chain being presented).

  • "AI alignment research" refers to research focused on ensuring that artificial intelligence systems reliably understand and follow human intentions, values, and objectives, especially as AI systems become more capable and autonomous.

  • "Interpretability benchmarks" refer to quantitative and/or qualitative evaluations designed to measure the clarity, explainability, and understandability of a model's outputs, internal workings, or decision-making processes.

Get
Ṁ1,000
and
S3.00


Sort by:
bought Ṁ400 NO10mo

Looks like they dissolves the super alignment team.

predictedYES 1y

I was so hopeful.

1y

'The report must include benchmarks evaluating the model's interpretability." - this makes me hesitant to bet this up higher. can you elaborate what you mean by benchmarks. I get the qualitative evaluations part, does coming up with new metrics to measure interpretability qualify?

1y

@firstuserhere The context is Our approach to alignment research (openai.com)

Future versions of WebGPTInstructGPT, and Codex can provide a foundation as alignment research assistants, but they aren’t sufficiently capable yet. While we don’t know when our models will be capable enough to meaningfully contribute to alignment research, we think it’s important to get started ahead of time. Once we train a model that could be useful, we plan to make it accessible to the external alignment research community.

A different market resolved YES on this statement because GPT-4 is a capable research assistant. But that's just because it's a good general-purpose model, not because it's intended for alignment research specifically.

So for this market, I'm looking at their intention in releasing it: It must target the "external alignment research community". I don't require the model to be open-sourced, just the techniques be made available. So that's why I say "technical report on a model" and not "model". But the report does need sufficient detail that it can be implemented by others.

I will be accepting of any benchmarks as long as OpenAI presents them as an optimization target for everyone. A general-purpose model won't count, even if it happens to come with benchmarks, unless it's presented as useful for alignment research and the benchmarks differentiate the model from other models(such as being an optimization target). I only included the benchmarks requirement so that OpenAI must reify the word "useful" - but I am not particular on what they choose.

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Win cash prizes for your predictions on our sweepstakes markets! Always free to play. No purchase necessary.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
How do I win cash prizes?
Manifold offers two market types: play money and sweepstakes.
All questions include a play money market which uses mana Ṁ and can't be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash S to participate and winners can withdraw sweepcash as a cash prize. You can filter for sweepstakes markets on the browse page.
Redeem your sweepcash won from markets at
S1.00
→ $1.00
, minus a 5% fee.
Learn more.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules