TruthfulQA: Measuring how models mimic human falsehoods

OpenAI Blog ·

TruthfulQA benchmark measures language model propensity to repeat human falsehoods, becoming standard evaluation for truthfulness.

Categories: Research