Am I missing something? The article seems to suggest it works via hidden text characters. Has OpenAI never heard of pasting text into a utf8 notepad before?

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    3 months ago

    They can cycle a some biases (dozens?) and test them all. Detokenization is super cheap to run, its not AI or anything.

    I’m trying to think of a good analogy for how this would work, and I kinda came up with one. This would be kinda like an image encoder that biases itself towards coding RGB values (0-255) as even numbers. Subtly, say 30% odd 70% even.

    That’s totally imperceptile to humans. And even a “small” sample of the image would carry this bias if pasted into a larger image verbatim, since the sample size is so large (just as the sample size for a bunch of tokens in text is pretty big.

    And I’m not saying its fullproof… but if thats indeed what they’re doing, I think its a decent way to detect “lazy” OpenAI abusers who aren’t working so hard to scramble and defeat it.