Introducing Synoema: A Language Machines Can Verify
Synoema looks, at first glance, like a stripped-down functional language. No classes, no mutable variables, no return statements. Functions are equations. Pattern matching is the primary dispatch mechanism. The type system is invisible — you never write a type annotation unless you want one for documentation.
But the design choices run deeper than aesthetics. Every syntax decision was measured against how LLMs tokenize text. Every language feature was evaluated for how much it helps or hurts machine generation. This article is a tour of the language: what it looks like, how it works, and why it's built the way it is.
The Basics: Functions as Equations
In Synoema, there are no statements — only expressions. A function definition is an equation:
square x = x * x
greet name = "Hello, " ++ name ++ "!"
The left side is the function name and its parameters. The right side is the body expression. The result of the function is the value of the body expression — there's no return keyword.
Multiple equations define pattern matching:
fac 0 = 1
fac n = n * fac (n - 1)
This is not an if-else chain. Each equation is tried in order, and the first matching pattern wins. The pattern 0 matches exactly the integer 0. The pattern n matches anything and binds it to the name n. This is how pattern matching replaces most conditional logic in Synoema.
For inline conditionals, there's the ternary operator:
abs x = ? x < 0 -> x * (-1) : x
The syntax ? cond -> then : else uses exactly 3 BPE tokens for the keywords/operators — compared to Python's then if cond else other which uses the same 3 tokens in a different, harder-to-predict order.
List Processing
Lists are built-in. The empty list is []. The cons operator : prepends an element:
nums = 1 : 2 : 3 : [] -- same as [1 2 3]
head (x:_) = x
tail (_:xs) = xs
List comprehensions work as expected:
evens = [x | x <- [1..100], x % 2 == 0]
squares = [x * x | x <- [1..10]]
And the pipe operator chains transformations:
result = [1..10]
|> filter (\x -> x % 2 == 0)
|> map (\x -> x * x)
|> sum
-- 220
QuickSort in 4 lines, with list comprehension:
qsort [] = []
qsort (p:xs) = qsort lo ++ [p] ++ qsort hi
lo = [x | x <- xs, x <= p]
hi = [x | x <- xs, x > p]
Algebraic Data Types
Synoema supports algebraic data types (ADTs) — sum types (unions) and product types (records). These are the basis for type-safe domain modeling:
-- A simple enum
Color = Red | Green | Blue derive (Show, Eq, Ord)
-- A parameterized type (like Maybe/Option)
Option a = None | Some a derive (Show)
-- A recursive type (binary tree)
Tree a = Leaf | Node a (Tree a) (Tree a) derive (Show)
The derive clause automatically generates implementations of Show (string conversion), Eq (equality), and Ord (ordering). This is familiar from Haskell but matters here for a specific reason: the model never has to manually write these boilerplate implementations, reducing a large source of type errors.
Pattern matching on ADTs:
depth Leaf = 0
depth (Node _ left right) = 1 + max (depth left) (depth right)
insert x Leaf = Node x Leaf Leaf
insert x (Node y left right) =
? x < y -> Node y (insert x left) right
: ? x > y -> Node y left (insert x right)
: Node y left right
Type Classes
Type classes define interfaces that types can implement:
trait Area a
area : a -> Float
Shape = Circle Float | Rect Float Float
impl Area Shape
area (Circle r) = 3.14159 * r * r
area (Rect w h) = w * h
total_area shapes = sum (map area shapes)
The type of total_area is inferred as [Shape] -> Float. If you pass a list containing a value that doesn't implement Area, the type checker catches it before runtime.
Verification Contracts
Beyond types, Synoema supports optional verification contracts: preconditions (requires) and postconditions (ensures) that are checked at runtime:
divide : Int -> Int -> Int
requires divisor /= 0
ensures result * divisor == dividend
divide dividend divisor = dividend / divisor
Contracts serve two purposes: they document the function's invariants in a machine-readable format, and they provide a third layer of verification beyond the type system. A function that type-checks but violates its contract will be caught at the first call that triggers the violation.
Contracts are also extractable for documentation:
$ synoema doc --contracts myfile.sno
Function Type Requires Ensures
-------- ---- -------- -------
divide Int->Int->Int divisor /= 0 result * divisor == dividend
For LLM-generated code in regulated domains — financial calculations, medical algorithms — extractable contracts provide a machine-readable specification that can be checked against business requirements without executing the code.
Built-In I/O and Networking
Synoema is not purely functional — it has built-in I/O, file operations, TCP networking, and subprocess pipes. This is what allows it to serve as a practical scripting and service language, not just an academic exercise:
-- HTTP server in 8 lines
handle fd =
req = fd_readline fd
fd_write fd "HTTP/1.0 200 OK\r\nContent-Length: 6\r\n\r\nHello!"
fd_close fd
main =
listener = tcp_listen 8080
loop l = handle (tcp_accept l) ; loop l
loop listener
This is real, executable code — the Synoema website itself runs on a server written in Synoema. File reading, environment variables, JSON parsing, subprocess execution: all built-in, no external dependencies required.
The Compiler Architecture
Synoema is implemented in approximately 41,000 lines of Rust across 9 workspace crates. Two external dependencies only: Cranelift (for JIT compilation) and pretty_assertions (for test output). No tokio, no serde, no async runtime.
| Crate | Purpose |
|---|---|
synoema-lexer | Tokenization with offside rule (indentation → INDENT/DEDENT) |
synoema-parser | Pratt parser, 15 expression kinds, AST construction |
synoema-types | Hindley-Milner inference, row polymorphism, contract checking |
synoema-core | Core IR (System F), constant folding, dead code elimination |
synoema-eval | Tree-walking interpreter, all builtins, full I/O |
synoema-codegen | Cranelift JIT compiler, tagged pointer ABI, arena memory |
synoema-diagnostic | Structured errors with LLM hints, fixability scores |
synoema-lsp | LSP server (hover types, go-to-def, diagnostics, completion) |
synoema-repl | CLI: run, jit, eval, test, doc, watch, init, install |
The JIT Backend
The interpreter (synoema run) is a tree-walking evaluator — it's fast enough for development, with full I/O and networking support. For production speed, synoema jit compiles to native code via Cranelift, the same compiler backend used by Wasmtime and Rust's production WebAssembly toolchain.
The JIT uses a tagged pointer ABI: all values are represented as 64-bit integers with type information encoded in the low bits. Lists use bit 0, strings use bit 1, small integers use the remaining bits directly. This eliminates boxing overhead for common values and allows the JIT to generate tight native code for arithmetic-heavy programs.
Memory is managed with an 8MB bump allocator (arena) with a region stack. In tail-recursive loops, the arena is automatically reset — meaning long-running servers and recursive computations don't leak memory. There is no garbage collector.
The performance results: median 3x speedup over CPython on 12 benchmark tasks, up to 28x on fibonacci (where tail-call optimization eliminates the recursive call overhead entirely).
The LLM Toolchain
Three tools are designed specifically for integration with LLM workflows:
GBNF grammar (lang/tools/constrained/synoema.gbnf): 162 lines, 48 rules. Used with llama.cpp and vLLM for constrained decoding — the model can only generate tokens that form valid Synoema syntax.
MCP server (npx synoema-mcp): Integrates with Claude Desktop, Cursor, and Zed. Tools: eval, typecheck, run, constrain (token masking for incremental generation). One-line setup in any MCP-compatible IDE.
Structured errors (synoema --errors json): Every error includes a fixability score (0.0–1.0), a did_you_mean suggestion, and an llm_hint field — a natural language description of what went wrong and how to fix it. Designed for machine consumption in self-correction loops.
The compact reference (docs/llm/synoema-compact.md, ~900 tokens) fits in the context window of even the smallest deployed models. The full reference (docs/llm/synoema.md, ~1800 tokens) is used for baseline prompting.
Together, these tools form a complete LLM-native development environment: the model generates code with grammar constraints, the compiler type-checks and validates, structured errors enable self-correction, and the MCP server provides real-time interaction.
Related Articles
How a Compiler Catches AI Mistakes Before They Run
The three-layer verification pipeline in detail, with LLM-friendly error messages.
ExplainerWhy AI Writes Broken Code — and How Type Systems Can Fix It
The statistics behind LLM code failures and how type-guided generation addresses them.
ResultsFrom Zero to 41%: Building an AI That Writes Working Code
The full journey: corpus generation, fine-tuning, and honest results.