A powerful, production-ready Streamlit web application for comprehensive LLM response evaluation and benchmarking. Features multi-dimensional scoring across 7 key criteria, interactive analytics ...
Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...