This isn't exactly correct, it is a combination of training and system prompt. Y...

IX-103 · on May 7, 2025

You can't just train with the negative examples showing filtered content, as that could lead to poor generalization. You'd need to supplement with samples from the training set to prevent catastrophic forgetting.

Otherwise it's like taking slices out of someone's brain until they can't recite a poem. Yes, at the end they can't recite a poem, but who knows what else they can no longer do. The positive examples from training essentially tell you what slices you need to put back to keep it functional.