Evaluating large language models trained on code

OpenAI Blog ·

OpenAI's HumanEval benchmark for evaluating code generation models introduced alongside Codex release, measuring functional correctness.

Categories: Research