Type-Guided Constrained Decoding: How to Stop LLMs from Hallucinating Code
From GBNF Grammars to Type-Directed Generation: Guarantees Instead of Hope Who this is for. If you've ever had ChatGPT generate code that doesn't compile — this article explains how to eliminate th...

Source: DEV Community
From GBNF Grammars to Type-Directed Generation: Guarantees Instead of Hope Who this is for. If you've ever had ChatGPT generate code that doesn't compile — this article explains how to eliminate that completely. All technical terms explained in footnotes and the glossary at the end. In previous articles, we showed that reducing tokens saves money, energy, and compute. But there's a more serious problem: LLMs generate incorrect code. And every retry doubles the token spend. The Scale of the Problem Type errors account for 33.6% of all failures in LLM-generated code (Mündler et al., PLDI 20251). These aren't typos — they're structural errors: wrong argument types, incompatible return values, accessing nonexistent fields. When an LLM generates a sort function that doesn't compile, the cost doubles — either a human fixes it (time) or an agent retries (tokens). But what if the model physically cannot generate syntactically invalid code? Three Levels of Constraints Level 1: Grammar (Syntacti