CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

· HN · ArXiv ·

CODA presents a method to rewrite transformer blocks as GEMM-epilogue programs, potentially improving inference efficiency through optimized matrix multiplication scheduling.

Categories: Research

Excerpt

HN · 76 points · 7 comments

Discussions