Open source repositories tagged with #cuda-kernels, ranked by health score.
Fast LLM speculative inference server for consumer hardware.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.