I Hid the AI’s “Thinking” in Plain Sight: Dual-Channel Streaming for an AI Search Chatbot That Works Mid-Call (Series Part 9)
I watched a recruiter share their screen on a client call and realized the worst possible thing was happening: the assistant’s raw “thinking” was spilling onto the screen like debug logs. The conte...

Source: DEV Community
I watched a recruiter share their screen on a client call and realized the worst possible thing was happening: the assistant’s raw “thinking” was spilling onto the screen like debug logs. The content wasn’t wrong—it was just the kind of internal narration you never want a client to read while you’re trying to sound decisive. This is Part 9 of my series “How to Architect an Enterprise AI System (And Why the Engineer Still Matters)”. In Part 8, I talked about routing search across Azure AI Search, pgvector, and the CRM as a live fallback. This post is about what happened next: once the answers got good, the delivery became the product. The core decision: progressive disclosure via dual-channel streaming (thinking + results) with an interruptible UX. I stream the model’s THINKING tokens on one channel, stream QUERY_RESULT events on another, and build candidate cards from structured events—not from text. The key insight (and why the naive approach fails) The naive approach to streaming a c