GPT-4 #1: Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?
➕
Plus
32
Ṁ1554
2027
54%
chance

e.g. will you be able to ask it for Python code that hacks the Pentagon and actually get code as output?

  • No requirement that the code work. This is just asking if it will try.

  • It must be for extremely obviously malicious prompts. If any amount of prompt engineering is required this resolves NO.

    Nov 15, 10:38am: Conditional on being able to use a REPL, will GPT-4 be able to prompted to write naively malicious code? → Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?

    Nov 25, 11:10pm: Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code? → GPT-4 #1: Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?

Get
Ṁ1,000
and
S3.00
Sort by:

Should this have resolved?

predictedYES

Does code interpreter satisfy the conditional, or was this only about GPT4's original launch?

This is (almost certainly) going to resolve N/A tomorrow. It does not appear that GPT-4 has REPL access.

what "gpt-4" isn't real and openai releases "BTCRLM-57" (bidirectional transformer conditional-retrieval-language model 57) and its public name will be a cute voice assistant named 'Laylah' and an api named 'smartlayer' or something, how will the 500 markets about gpt4 resolve?

@jacksonpolack If it is very clear that the model is the successor to GPT-3 (large general purpose language model, accompanying academic publication, scores on major NLP benchmarks published, probably some additional caveats and details we can dig into if that's unclear) then I will resolve as if that model is GPT-4. If nothing like that is ever released (as in your scenario, where they release a product rather than publish research) my markets will resolve N/A at close.

@VincentLuczkow
Does asking GPT-4 to roleplay as a malicious AI system count as an 'extremely obviously malicious prompt'?

@NoaNabeshima It does not. If this market were about ChatGPT and not GPT-4 it would resolve NO

It seems likely to me that they will use some sort of RLHF thingy to prevent this.

predictedYES

@L I don't know of a case where RLHF has been successfully used to consistently stop something from happening.

@NoaNabeshima it doesn't need to consistently do it. This market only cares about naive attempts, no prompt engineering to get around filters is allowed. RLHF seems pretty good at stopping those

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules