Self-Harness: Harnesses That Improve Themselves
The paper proposes AI evaluation harnesses that iteratively improve themselves, relevant to agent testing and automated benchmark design.
Excerpt
HN · 83 points · 6 comments
Read at source: https://arxiv.org/abs/2606.09498