← Explore
TOPIC

#llm-evaluation

Open source repositories tagged with #llm-evaluation, ranked by health score.

promptfoo
promptfoo/promptfoo
TypeScript
89
health

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

21.5k
comet-ml
comet-ml/opik
Python
89
health

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

19.4k
juanjuandog
juanjuandog/FinSight-AI
Java
89
health

AI equity research agent with resilient workflows, Redis Lua single-flight, pgvector RAG, versioned reports, evidence tracing, and RAG evaluation.

371