Anyone telling you they have tamed LLMs into producing 100% deterministic answers has either scoped the problem space so narrowly as to border on meaningless (e.g. "Is earth flat?" with a structured output schema of a single JSON boolean value), hasn't done robust statistical validation to actually confirm truly deterministic outputs, or both.
LLMs are fundamentally non-deterministic. Trying to use them to solve deterministic problem spaces is selecting the wrong tool for the job, and expecting them to be 100% reliable is the wrong mindset for working with them.
I agree with your example about a deterministic answer but what I’m looking for is deterministic process. I seek the LLM’s opinion, not a boolean answer. For example, having an agentic skill or hook to do a SWOT analysis may one day (out of 1000 consistent days) result in the agent just produce S-W-O and no T in the process because it was simply context muddled that day.
In that case, you don't want to be using Claude Code, which is more of a consumer product; you instead want to control the inference stack yourself. What you are looking for is structured output (you give the inference engine a JSON schema you define that the response must conform to) + a JSON schema validator that parses the output, checks if the response is valid JSON. If it is, you're good to go, if not, run the inference again. llama.cpp supports structured outputs, as do some more consumer-oriented tools that wrap like LM Studio. If you don't want to buy hardware yourself or pay exorbitant cloud rental prices, p2p GPU rental marketplaces like vast.ai can offer much more economical options.
Right, but do you care about how the sausage was made, or just how it looks and tastes?
You can get Claude Code to fulfill some interface contract with almost certainty. Exactly how it does that will vary between runs.
So to me the more interesting question is, what exactly is it you care about inside the sausage, and how do you verify that it's there in the right amounts?
Anyone telling you they have tamed LLMs into producing 100% deterministic answers has either scoped the problem space so narrowly as to border on meaningless (e.g. "Is earth flat?" with a structured output schema of a single JSON boolean value), hasn't done robust statistical validation to actually confirm truly deterministic outputs, or both.
LLMs are fundamentally non-deterministic. Trying to use them to solve deterministic problem spaces is selecting the wrong tool for the job, and expecting them to be 100% reliable is the wrong mindset for working with them.
I agree with your example about a deterministic answer but what I’m looking for is deterministic process. I seek the LLM’s opinion, not a boolean answer. For example, having an agentic skill or hook to do a SWOT analysis may one day (out of 1000 consistent days) result in the agent just produce S-W-O and no T in the process because it was simply context muddled that day.
In that case, you don't want to be using Claude Code, which is more of a consumer product; you instead want to control the inference stack yourself. What you are looking for is structured output (you give the inference engine a JSON schema you define that the response must conform to) + a JSON schema validator that parses the output, checks if the response is valid JSON. If it is, you're good to go, if not, run the inference again. llama.cpp supports structured outputs, as do some more consumer-oriented tools that wrap like LM Studio. If you don't want to buy hardware yourself or pay exorbitant cloud rental prices, p2p GPU rental marketplaces like vast.ai can offer much more economical options.
Right, but do you care about how the sausage was made, or just how it looks and tastes?
You can get Claude Code to fulfill some interface contract with almost certainty. Exactly how it does that will vary between runs.
So to me the more interesting question is, what exactly is it you care about inside the sausage, and how do you verify that it's there in the right amounts?