Stochastic AI search is the property that distinguishes AI answer engines from classic search engines. A Google query returns the same ten blue links if you run it five times in a row. A ChatGPT query for the same buyer question can return five different lists of recommended products. The variability is intrinsic to the model and the retrieval layer above it.
Where the variability comes from
Three sources: temperature in the language model itself, which adds randomness even at low settings; retrieval fan-out, where the answer engine searches the web with slightly different queries each time; and recency, where the index returns different documents over the course of a day. Together they produce the 'less than 1-in-100 chance of getting the same recommendation list twice' result that the CMU LLM Whisperer study reported.
What this means for measurement
You cannot measure AI visibility by asking once. You measure it by asking the same question many times, treating the percentage of answers that named you as the signal, and treating any single answer as one draw from a distribution. This is the same statistical move that A/B testing makes for product analytics: variance is not noise to be ignored, it is the thing the methodology has to handle.
What this means for strategy
Strategies that depend on hand-crafting one answer to one prompt do not work. Strategies that improve the underlying evidence the model leans on (your pages, third-party comparisons, the citations in your category) work, because they raise the floor of every answer the model produces.