Beyond the Buzzwords, with Katherine Munro

Beyond the Buzzwords, with Katherine Munro

Share this post

Beyond the Buzzwords, with Katherine Munro
Beyond the Buzzwords, with Katherine Munro
Steal My Idea: Evaluating LLM Systems with Production Data at Scale

Steal My Idea: Evaluating LLM Systems with Production Data at Scale

A framework for fixing the gaps in your LLM-testing strategy

Katherine Munro πŸ‘©β€πŸ’»'s avatar
Katherine Munro πŸ‘©β€πŸ’»
Feb 11, 2025
βˆ™ Paid

Share this post

Beyond the Buzzwords, with Katherine Munro
Beyond the Buzzwords, with Katherine Munro
Steal My Idea: Evaluating LLM Systems with Production Data at Scale
1
Share
Building things with LLMs that need to work in the real-world? This post is for you. Source: Author provided.

In my last post [1], I described how my team and I have been testing our WIP conversational assistant, despite having no baseline or benchmarks, and despite the LLM testing landscape being relatively immature. But there’s still a gap when it come…

Keep reading with a 7-day free trial

Subscribe to Beyond the Buzzwords, with Katherine Munro to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
Β© 2025 Katherine Munro πŸ‘©β€πŸ’»
Privacy βˆ™ Terms βˆ™ Collection notice
Start writingGet the app
Substack is the home for great culture

Share