Improve the interaction with Stream AI Responses
I had a Spring Boot API talking to AI providers, and at first it did the most obvious thing: send the prompt, wait for the model to finish, and then return the full response as JSON. It worked. But...

Source: DEV Community
I had a Spring Boot API talking to AI providers, and at first it did the most obvious thing: send the prompt, wait for the model to finish, and then return the full response as JSON. It worked. But it also felt wrong. When you are dealing with AI-generated text, waiting several seconds for a complete response is a pretty bad experience. The model is already producing tokens progressively, but the API was hiding that and making the client wait for everything. So I decided to fix that and add proper streaming support. This post is about that change. Not a giant rewrite. Just a practical refactor to make AI responses feel alive instead of delayed. The original problem The first version of the endpoint was synchronous. The flow was basically: Receive the prompt Call the AI provider Wait for the entire answer Return one JSON response That is simple, but it creates an awkward UX. Even when the model is generating steadily, the user sees nothing until the very end. For normal CRUD APIs, that