Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts (Anthropic)
Anthropic releases BioMysteryBench, a bioinformatics benchmark where Mythos solved ~30% of questions that stumped human experts, demonstrating AI capability on expert-level scientific problems.
Excerpt
<a href="https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench"><img align="RIGHT" border="0" hspace="4" src="http://www.techmeme.com/260430/i4.jpg" vspace="4" /></a>
<p><a href="http://www.techmeme.com/260430/p4#a260430p4" title="Techmeme permalink"><img height="12" src="http://www.techmeme.com/img/pml.png" style="border: none; padding: 0; margin: 0;" width="11" /></a> <a href="https://www.anthropic.com/">Anthropic</a>:<br />
<span style="font-size: 1.3em;"><b><a href="https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench">Anthropic unveils BioMysteryBench to test Claude's bioinformatics skills against human experts, and says Mythos solved ~30% of 23 questions that stumped experts</a></b></span> — In this post, Brianna, a researcher on the discovery team, shares results from a recent bioinformatics benchmarking effort.</p>
Read at source: http://www.techmeme.com/260430/p4#a260430p4