TOPIC

#llm-evaluation-framework

Open source repositories tagged with #llm-evaluation-framework, ranked by health score.

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

★ 23.3k