How a Compiler Catches AI Mistakes Before They Run
A spell checker doesn't make you a better writer. It just catches errors you would have caught yourself eventually — only faster, and without the embarrassment. That's a useful tool, but it's not transformative.
Now imagine a spell checker that also understood grammar, logic, and the meaning of what you were trying to say. One that caught not just "teh" but "their/there/they're" and "the conclusion doesn't follow from the premises." That's a different category of tool.
Synoema's compiler is trying to be the second kind of tool, specifically for LLM-generated code. Not just a syntax validator, but a three-layer verification system that catches different classes of error at different points in the pipeline. This article explains how each layer works and what it catches.
The Problem with Catching Errors Later
The conventional workflow for LLM-generated code is: generate, then validate. The model produces a program; a human (or an automated test) checks whether it works. If it doesn't, the human corrects it or prompts the model again.
This works. But it has a fundamental inefficiency: errors discovered at runtime are the most expensive kind. You've already spent the generation budget. You've already committed to a program structure. Correcting a runtime error often means regenerating large sections of the program rather than making a small fix.
There's also the problem of error observability. Runtime errors in dynamically typed languages like Python often surface as vague exceptions — AttributeError: 'NoneType' object has no attribute 'items' — that don't clearly indicate what went wrong in the generated code or how to fix it. For an LLM self-correction loop, a clear, structured error message is much more valuable than a Python traceback.
Synoema's approach is to catch errors as early as possible — ideally before the code is even fully generated — and to report them in a format that both humans and LLMs can act on.
Layer 1: The Grammar Constraint
The first layer operates at generation time, not after generation. A formal grammar — written in GBNF format, compatible with llama.cpp and vLLM — specifies exactly which token sequences form valid Synoema syntax.
With grammar-constrained decoding enabled, the model cannot generate a syntactically invalid program. At each step, the grammar acts as a filter: only tokens that continue a valid syntactic prefix are allowed. Tokens that would produce a syntax error are assigned zero probability before the model even considers them.
Synoema's grammar is 162 lines and 48 rules. It covers:
- Top-level definitions (functions, type definitions, traits, implementations)
- Expressions (literals, applications, binary operations, let-bindings, list comprehensions)
- Patterns (literal patterns, constructor patterns, cons patterns, wildcard)
- The pipe operator and ternary expressions
- Contract annotations
The implementation uses XGrammar (Dong et al., 2024), which preprocesses context-free grammars to enable 100x faster constrained decoding than naive token-by-token filtering. The overhead is reduced to near-zero in practice.
What this layer catches: all syntactic errors. The 24% of GitHub Copilot suggestions that contain compilation errors (Nguyen & Nadi, 2022) — eliminated by construction. The model simply cannot generate them.
What it doesn't catch: semantic errors. A syntactically valid program can still compute the wrong thing.
Layer 2: Hindley-Milner Type Inference
After generation, the type checker runs. Synoema uses Hindley-Milner (HM) type inference — the same system as Haskell and OCaml — which is provably sound: a program that passes type checking will not produce a type error at runtime.
Type inference is automatic. The model never writes type annotations; the type checker infers them. Every function has an inferred type. Every application is verified. Every pattern match is checked for exhaustiveness.
Consider this example:
-- Model generates this:
process items =
total = sum items
? total > 100 -> "large" : total
The type checker finds a problem: the ternary expression returns "large" (a String) in one branch and total (an Int) in the other. The branches must have the same type. This is caught before the program runs:
{
"error": "type_mismatch",
"message": "branches of ternary must have the same type",
"got": ["String", "Int"],
"location": {"line": 3, "col": 5},
"fixability": 0.9,
"llm_hint": "Both branches of '? -> :' must return the same type. Change the else branch to return a String, e.g., show total",
"did_you_mean": "? total > 100 -> \"large\" : show total"
}
Notice the error format. This is not a human-readable traceback — it's structured JSON, designed for machine consumption. Every error includes:
error: a stable error code that can be pattern-matched in automationfixability: a score from 0.0 to 1.0 indicating how mechanically fixable the error is (0.9 means: this is almost certainly fixable with a small change)llm_hint: a natural-language description of what went wrong and how to fix it, phrased to be useful as a follow-up promptdid_you_mean: in many cases, the corrected code snippet itself
This structure is what enables self-correction loops. The model generates code, the compiler returns a JSON error, the model receives the error as context and generates a correction. The llm_hint field gives it exactly the information it needs.
What this layer catches: type errors (33.6% of LLM code failures, per Tambon et al. 2025). According to Mundler et al. (PLDI 2025), type-constrained decoding reduces compilation errors by 74.8% compared to 9.0% for syntax-only.
Layer 3: Verification Contracts
The third layer addresses something types cannot: the relationship between inputs and outputs. A function can be type-correct and still compute the wrong thing. Contracts capture the semantic invariants that types cannot express:
withdraw : Account -> Int -> Account
requires amount > 0
requires account.balance >= amount
ensures result.balance == account.balance - amount
withdraw account amount =
Account { balance = account.balance - amount }
The requires clauses are checked before the function executes. The ensures clause is checked after. If either fails, execution stops with a structured error — not a generic exception, but a contract violation with the specific condition that failed.
Contracts are optional and additive — a function without contracts still gets types checked. But for high-stakes code (financial logic, API invariants, safety properties), contracts provide a third net that catches errors that passed type checking.
Contracts also serve a documentation function. The command:
$ synoema doc --contracts api.sno
generates a compact table of every function's type, preconditions, and postconditions. For a grant reviewer or a system auditor, this is a machine-readable specification of what the code promises to do — extracted directly from the code, so it can never diverge from the implementation.
Error Messages Designed for LLMs
The diagnostic layer (synoema-diagnostic crate) deserves special attention. Every error in Synoema has a corresponding llm_hint — 100% coverage, enforced in the compiler's test suite.
Writing an llm_hint for an error is not the same as writing a human-readable message. Human messages often include context that a human already has ("did you mean to use '==' instead of '='?"). LLM hints need to be self-contained prompts: they include the error type, the specific location, the inferred expected vs actual, and a concrete suggestion.
Compare:
Typical compiler error (human-facing):
error[E0308]: mismatched types
--> src/main.rs:5:13
|
5 | let x: i32 = "hello";
| --- ^^^^^^^ expected `i32`, found `&str`
Synoema error (LLM-facing):
{
"error": "type_mismatch",
"expected": "Int",
"got": "String",
"location": {"line": 5, "col": 13},
"fixability": 0.95,
"llm_hint": "Variable 'x' is declared with type Int but assigned a String value. Either change the value to an integer literal, or remove the type annotation to allow inference.",
"did_you_mean": "x = 42"
}
The difference is the self-contained explanation in llm_hint and the actionable suggestion in did_you_mean. When this error is injected back into the model's context ("Your code produced this error: [JSON]"), the model has everything it needs to generate a correction without additional human guidance.
The Verification Pipeline
Putting it together, the full pipeline for LLM-generated Synoema code is:
- Generation with GBNF constraint → syntactically valid output guaranteed
- Type checking → type errors caught, structured JSON error returned
- Contract checking → preconditions/postconditions verified at runtime
- Test execution →
synoema verify file.snoruns with a timeout and returns structured pass/fail
Steps 2–4 can be run in a self-correction loop: if step 2 or 3 fails, the error JSON is fed back to the model as context, and the model regenerates. Each iteration has a defined error structure to work from — not a vague traceback, but a specific, actionable description of what to fix.
This is the right architecture for machine-generated code: not "generate and hope," but "generate, verify, correct, and re-verify." The compiler is an active participant in the generation process, not a passive post-hoc checker.
The question we're still working on: how many iterations does self-correction require, and when does it converge? Our Phase D data shows that multipass self-correction helps significantly for 7B+ models (Qwen3-8B improves from 7% to 48% run rate with one correction pass). More data is needed to characterize the convergence behavior across model sizes and task categories.
Related Articles
Why AI Writes Broken Code — and How Type Systems Can Fix It
The research behind type errors in LLM-generated code and what constrained decoding achieves.
ExplainerIntroducing Synoema: A Language Machines Can Verify
A full tour of the language, including the contract system and LLM toolchain.
ResultsWhat We Learned Teaching AI a New Language
Experimental results across 10+ models, including multipass self-correction data.