Beyond the Buzzwords, with Katherine Munro

Beyond the Buzzwords, with Katherine Munro

Using one LLM to Judge Another? Here’s Five Reasons You Shouldn’t

Concrete ‘gotchas’ for people building things with Gen AI.

Katherine Munro 👩‍💻's avatar
Katherine Munro 👩‍💻
Apr 25, 2025
∙ Paid
1
Share
A collection of sticky notes with ‘watch out!’, ‘gotcha’, and other such messages, which will be discussed in this article.

As teams across the world scramble to build things with Gen AI and Large Language Models, countless startups are scrambling to build LLM evaluation tools to serve them. Many such tools use LLMs to judge the system’s output, be it the final results or just the LLM-component’s responses. This makes sense, for some stages of the product development process…

Keep reading with a 7-day free trial

Subscribe to Beyond the Buzzwords, with Katherine Munro to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Katherine Munro 👩‍💻
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture