Streaming structured output from LLMs with backpressure
Parsing half-formed JSON as it streams in is mostly a discipline of never trusting a closing brace you have not seen yet. The model is fast; the client is slow; the channel between them is where the design lives.
A small, boring state machine — one that can pause the producer when the consumer falls behind — has earned its keep more than any clever parser I have written.