Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

· HN · ArXiv ·

ArXiv paper proposes that AI alignment discourse itself may induce misalignment by shaping training dynamics, a meta-level concern for the field.

Categories: Research

Excerpt

HN · 76 points · 29 comments

Discussions