Beyond the Buzzwords, with Katherine Munro

Beyond the Buzzwords, with Katherine Munro

LLMs as Judges: Practical Problems and How to Avoid Them

Concrete advice for teams building LLM-powered evaluations

Katherine Munro πŸ‘©β€πŸ’»'s avatar
Katherine Munro πŸ‘©β€πŸ’»
Sep 03, 2025
βˆ™ Paid

A collection of sticky notes and emojis related to evaluation (graphs, ticks and crosses, etc)
All images: Author provided.

My last post was all about conceptual problems with using Large Language Models to judge other LLMs. In it, I presented the β€œgotchas” that teams should watch out for when building LLM-powered products. Of course, the point is not to say that all LLM evaluations are bad, or that human judges are always better. There are defini…

User's avatar

Continue reading this post for free, courtesy of Katherine Munro πŸ‘©β€πŸ’».

Or purchase a paid subscription.
Β© 2026 Katherine Munro πŸ‘©β€πŸ’» Β· Privacy βˆ™ Terms βˆ™ Collection notice
Start your SubstackGet the app
Substack is the home for great culture