Musings
6 entriesHalf-formed thoughts and working notes from the bench — applied AI, real-time infrastructure, and the craft of building across cultures.
Cutting voice-agent latency below 300ms at the edge
A field-tested latency budget for real-time voice agents: where the 300ms goes, which hops you can delete outright, and how to make a slow LLM feel immediate by streaming the first phoneme before the sentence is done.
Designing for two reading directions without a redesign
A layout that respects more than one cultural reading order is not a translation pass — it is a constraint you carry from the first wireframe.
Streaming structured output from LLMs with backpressure
Parsing half-formed JSON as it arrives, without letting a fast model overrun a slow client. A small state machine that has earned its keep.
A seal, a signature, and the shape of a good API
A personal seal is a contract pressed into a single mark. The best API surfaces aim for the same: small, deliberate, impossible to forge by accident.
Postgres as a search engine: how far can you push it?
Before reaching for a dedicated cluster, I gave Postgres full-text, trigram, and vector search a real workload. It went further than expected.
Edge caching strategies for personalized feeds
Personalization and caching are supposed to be enemies. With a layered key strategy at the edge, they can be made to cooperate.
No entries in this stream yet.