Estimating worst case frontier risks of open weight LLMs

OpenAI Blog ·

Research paper introducing malicious fine-tuning (MFT) methodology to estimate worst-case frontier risks when releasing open-weight LLMs in biology and cybersecurity domains.

Categories: Research

Excerpt

In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.