Steal My Idea: Evaluating LLM Systems with Production Data at Scale

A framework for fixing the gaps in your LLM-testing strategy

Feb 11, 2025

∙ Paid

Building things with LLMs that need to work in the real-world? This post is for you. Source: Author provided.

In my last post [1], I described how my team and I have been testing our WIP conversational assistant, despite having no baseline or benchmarks, and despite the LLM testing landscape being relatively immature. But there’s still a gap when it come…

Continue reading this post for free, courtesy of Katherine Munro 👩‍💻.

Or purchase a paid subscription.

Beyond the Buzzwords, with Katherine Munro

Steal My Idea: Evaluating LLM Systems with Production Data at Scale

A framework for fixing the gaps in your LLM-testing strategy

Continue reading this post for free, courtesy of Katherine Munro 👩‍💻.