Even (very) noisy LLM evaluators are useful for improving AI agents

(tensorzero.com)

23 points | by GabrielBianconi 2 days ago ago

5 comments

$SmithersBot 13 minutes ago

as long as OpenAI and Anthropic keep subsidizing dirt cheap Codex or Claude Code usage, I'll just keep using them as evaluators. The trick is to have a fresh instance doing the reviewing, not the one that did the work.
$ai_slop_hater an hour ago

What is an LLM evaluator?
[-]