Jeff Dean called it out explicitly: Google runs its pro-scale models as teachers, distilling what they know into the flash-scale versions. Fast, cheap, and for most real workloads, indistinguishable from the expensive one.
The signal from the last 30 days confirms this is no longer a technical curiosity. It is a procurement decision. Companies running local LLM setups report distilled 7B to 13B models handle their internal workflows fine. The debate has shifted from “can a small model do this?” to “which small model does this best?”
Most marketing copy, data wrangling, and agent loops do not need prestige-model pricing. Flash-class models handle them without complaint.
Stop treating model tier as a badge of quality. Start treating it as a configuration decision. Run the experiment: take your current prompt stack and run it against Gemini 3.5 Flash. Measure against actual acceptance criteria, not gut feeling about what is “serious.”
The companies winning at AI right now are not the ones spending the most on compute. They are the ones who have stopped spending money to feel important and started spending it to get results.