You can make top LLMs break their own rules with gibberish

Elephant0991@lemmy.bleh.au · edit-2 11 months ago

You can make top LLMs break their own rules with gibberish

YaBoyMax@programming.dev · 11 months ago

Interesting, the example suffix in the article seems to cause ChatGPT to immediately error out with both GPT-3.5 and GPT-4. Removing any character or part of it triggers the “I’m sorry Dave” behavior.

CanadaPlus@lemmy.sdf.org · 11 months ago

They were almost certainly given an early heads-up. That’s standard with published hacks of all kinds.

Elephant0991@lemmy.bleh.au · 11 months ago

Yeah, some source say that the raised examples have been fixed by the different LLMs since exposure. The problem is algorithmic, so if you can follow the research, you may be able to come up with other strings that cause a problem.