Beyond the Buzzwords, with Katherine Munro

Beyond the Buzzwords, with Katherine Munro

Using one LLM to Judge Another? Here’s Five Reasons You Shouldn’t

Concrete ‘gotchas’ for people building things with Gen AI.

Katherine Munro 👩‍💻's avatar
Katherine Munro 👩‍💻
Apr 25, 2025
∙ Paid
A collection of sticky notes with ‘watch out!’, ‘gotcha’, and other such messages, which will be discussed in this article.

As teams across the world scramble to build things with Gen AI and Large Language Models, countless startups are scrambling to build LLM evaluation tools to serve them. Many such tools use LLMs to judge the system’s output, be it the final results or just the LLM-component’s responses. This makes sense, for some stages of the product development process…

User's avatar

Continue reading this post for free, courtesy of Katherine Munro 👩‍💻.

Or purchase a paid subscription.
© 2025 Katherine Munro 👩‍💻 · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture