Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

· HN · LLMs ·

Tiny-vLLM launched as a C++ and CUDA LLM inference engine, adding another lightweight open-source runtime option.

Categories: OSS & Tools

Excerpt

HN · 106 points · 10 comments

Discussions