Do transformers need three projections? Systematic study of QKV variants

· HN · ArXiv ·

A systematic study tests whether transformer attention needs separate query, key, and value projections, probing a core architectural assumption.

Categories: Research

Excerpt

HN · 81 points · 11 comments

Discussions