We Built a Service That Catches LLM Drift Before Your Users Do
You shipped your LLM-powered feature. It worked perfectly in testing. Users loved the beta. Three weeks later, your support inbox fills up. Outputs are wrong. The JSON your app parses doesn't look ...

Source: DEV Community
You shipped your LLM-powered feature. It worked perfectly in testing. Users loved the beta. Three weeks later, your support inbox fills up. Outputs are wrong. The JSON your app parses doesn't look right. The classifier is giving different answers. Your LLM drifted. And you had no idea until users told you. This Happens More Than You Think In February 2025, developers on r/LLMDevs reported GPT-4o changing behaviour with zero advance notice: "We caught GPT-4o drifting this week... OpenAI changed GPT-4o in a way that significantly changed our prompt outputs. Zero advance notice." It's not just OpenAI. Claude, Gemini, and even "dated" model versions (supposedly frozen) change behaviour unexpectedly. When you call gpt-4o-2024-08-06 today, you might not get the same responses you got when you built your feature. The problem is: you can't tell unless you're actively testing. What We Built DriftWatch runs your test prompts against your LLM endpoint every hour and alerts you the moment behaviou