Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs

· HN · LLMs ·

Research demonstrates that finetuning LLMs can reactivate recall of copyrighted book content that alignment previously suppressed, revealing a persistent vulnerability in safety techniques.

Categories: Research

Excerpt

HN · 106 points · 62 comments

Discussions