About Synoema

The language of shared understanding

What is Synoema?

Synoema [sy-NO-e-ma] — from Greek synoema (συνόημα), "shared understanding". A BPE-aligned programming language purpose-built for LLM code generation.

The core insight: LLMs generate code token by token. Every token costs money, time, and context window. If the programming language itself is designed to align with how LLMs tokenize text, you get cheaper generation, faster inference, and more correct output — by construction, not by accident.

Synoema is not a general-purpose language competing with Python or Rust. It occupies a new niche: the language where AI writes code. It's optimized for the machine that generates it, not the human who reads it (though it's readable too).

Design Philosophy

Token Economy First

Every syntax decision is measured in BPE tokens. ? cond -> then : else costs 3 tokens; if cond then x else y costs 6. We chose 3. Every operator is verified against cl100k_base, Llama 3, and Mistral tokenizers.

Correctness by Construction

Hindley-Milner type inference catches errors without annotations. GBNF grammar constrains LLM output to be 100% syntactically valid. Verification contracts (requires/ensures) guard runtime behavior.

Minimal Dependencies

The entire compiler is 2 dependencies: Cranelift (JIT) and pretty_assertions (tests). No tokio, no serde, no async. ~41K LOC Rust, 9 crates. You can read the entire codebase in a day.

Immutable & Strict

All bindings are immutable. Evaluation is eager, left-to-right. No lazy evaluation surprises. Predictable for both humans and LLMs.

Equations, Not Statements

No def, no return, no semicolons between statements. Functions are equations: f x = body. The last expression is the result. Pattern matching via multiple equations.

Dual Backend

Interpreter for development (full I/O, networking, concurrency). Cranelift JIT for production speed (3x over Python). Same language, same semantics, choose your tradeoff.

Core Strengths

1. Token Efficiency — Task-Dependent

Measured with real tiktoken (cl100k_base) on 16 tasks. Results split by task category:

Task typevs PythonExamples
Functional & pattern-heavy−22% to −52%quicksort −38%, json_build −52%, pattern_match −40%
General algorithms−8% to +14%collatz −8%, fizzbuzz −6%, tree_traverse −1%
Imperative & index-heavy+33% to +105%binary_search +33%, matrix_mult +105%
Average (16 tasks)~0%Synoema 92.5 tokens, Python 92.9 tokens

Where Synoema wins: recursive algorithms, pattern matching, ADTs, JSON building — up to 52% savings.
Where Python wins: string operations, imperative-style code, matrix math — Python can be 87–105% more compact.

2. Native Speed — 3x Median Over Python

JIT-compiled via Cranelift to native x86-64. Benchmarked against CPython 3.12, median of 5 runs:

BenchmarkPythonSynoema JITSpeedup
fibonacci144ms5.1ms28.2x
factorial24ms5.7ms4.2x
gcd17ms4.7ms3.5x
collatz18ms5.6ms3.1x
quicksort17ms6.2ms2.7x
matrix_mult16ms7.7ms2.1x
Median (12 tasks)3.0x

Fibonacci shows 28x thanks to tail-call optimization. Typical sustainable speedup: 2–4x.

3. Guaranteed Syntactic Correctness

The GBNF grammar (162 lines, 48 rules) enables constrained decoding: the LLM can only generate tokens that form valid Synoema syntax. Works with llama.cpp, vLLM, SGLang, TensorRT-LLM via XGrammar (100x speedup over naive approaches).

Result: 100% of LLM-generated programs parse successfully. Compare with unconstrained generation where 24% of GitHub Copilot suggestions contain compilation errors (Nguyen & Nadi, 2022).

4. Type-Guided Generation

Hindley-Milner type inference acts as a semantic constraint on LLM output. Research shows type-constrained decoding reduces compilation errors by 74.8% vs only 9.0% for syntax-only constraints (Mundler et al., PLDI 2025). Synoema's type system is not just for correctness — it's a generation quality multiplier.

Scientific Foundation

Synoema's design is grounded in 23 peer-reviewed publications. Key findings that shaped the language:

FindingSourceImpact on Design
LLM inference consumes >90% of total AI energy TokenPowerBench, 2024 Token efficiency = direct cost/energy reduction
Attention cost is O(n²) — halving tokens = 4x less compute Vaswani et al., 2017 −33% tokens on functional tasks → ~55% less attention compute (O(n²) scaling)
Token efficiency varies 2.6x across languages Alderson, 2026 Language design can significantly impact token count
Type errors = 33.6% of LLM code failures Tambon et al., 2025 Hindley-Milner inference eliminates the dominant error class
Type constraints reduce errors by 74.8% Mundler et al., PLDI 2025 Type system as generation constraint, not just verification
Bridge tokens distort LLM distributions Domino, ICML 2024 All operators = 1 BPE token → no bridge token distortion
XGrammar: 100x speedup for grammar-constrained decoding Dong et al., 2024 GBNF grammar designed for efficient constrained decoding
LLM quality degrades with sequence length Multiple sources Fewer tokens = less context rot = better output quality

Synoema vs Other Languages

FeatureSynoemaPythonHaskellRustTypeScript
Token efficiency−33% functionalBaselineSimilarVerboseVerbose
Type inferenceFull (HM)NoneFull (HM)PartialPartial
Pattern matchingFull ADTsLimitedFullFullNone
Constrained decodingGBNFNoNoNoNo
JIT compilationCraneliftNo (CPython)GHCLLVMV8
LLM toolchainMCP + GBNFNoneNoneNoneNone
Learning curveMediumEasyHardHardEasy
EcosystemSmallHugeMediumLargeHuge
EvaluationStrictStrictLazyStrictStrict
ImmutabilityDefaultNoDefaultDefaultNo

Key insight: Synoema doesn't try to replace Python or Rust for human-written code. It's designed for the specific scenario where an LLM generates code — and in that scenario, token efficiency, type safety, and constrained decoding matter more than ecosystem size.

Maximum Impact Areas

Synoema gives nonlinear advantage where machines generate code and correctness is critical. The key insight: Synoema is not competing with Python for human developers — it's the language where AI writes code that other machines verify and execute.

Impact Matrix

Correctness: nice-to-haveCorrectness: critical
Machine generates code Edge AI microtools, on-device code gen, IoT rules Verified microservices, financial logic, executable specs, agent orchestration
Human writes code Python/JS are better choices Executable specifications, formal contracts

Maximum effect = machine generates, correctness critical. This is where all three verification layers (GBNF syntax + HM types + contracts) work simultaneously.

Six High-Impact Domains

1. LLM-Generated Microservices

User describes business logic in natural language. LLM generates a Synoema service. GBNF guarantees syntax. Types guarantee correctness. Contracts guard business invariants. The service runs immediately via TCP/HTTP builtins. Human never reads the code — code is an artifact between two machines.

2. Formally Verified AI Code

Today: LLM generates Python, human reviews, hopes it works. With Synoema: GBNF ensures syntax, HM types catch type errors (33.6% of all LLM failures), contracts enforce requires/ensures. Three layers of verification, zero human review needed for correctness. Critical for financial calculations, medical algorithms, regulatory compliance.

3. Edge AI / Small Models

Device: Raspberry Pi, phone, IoT. Model: 4B-7B parameters. Context: 2K-4K tokens. Synoema's compact reference (900 tokens) fits the full spec into context. GBNF eliminates syntax errors. JIT = 3x faster on constrained hardware. Fine-tuned 7B reaches 90.5% correctness.

4. LLM Self-Improvement Loops

LLM generates code → type checker finds errors → structured JSON error with llm_hint → LLM fixes using hint → repeat. Synoema's --errors json with fixability and did_you_mean is designed for machine consumption, not humans. No other language has error messages optimized for LLMs.

5. Executable Specifications

"Discount 10% for orders over 1000, max 500" becomes: discount : Int -> Int with requires total > 0, ensures result <= 500. The specification IS the code. Contracts are checked at runtime. synoema doc --contracts generates the spec table. Specification never diverges from implementation because they're the same thing.

6. AI Agent Orchestration

Multiple AI agents exchange Synoema programs instead of JSON or natural language. Programs are formally typed, verified by contracts, and executable. MCP server enables eval/typecheck/run in real-time. Synoema as lingua franca between AI agents — a shared language both machines understand with formal guarantees.

When NOT to Use Synoema

Critical thinking requires honesty. Synoema loses everywhere a human writes code or ecosystem matters more than correctness:

DomainWhy not SynoemaBetter choice
Web frontendsNo DOM, no browser APITypeScript/JavaScript
Data scienceNo numpy/pandas ecosystemPython
Systems programmingNo ownership, no unsafeRust
Enterprise backendSmall ecosystem, no ORMJava/Go
Mobile appsNo SDKSwift/Kotlin
String-heavy tasksPython is 87% more token-efficientPython
Existing codebaseMigration cost not justifiedWhatever's there
Human writes code manuallyPython is simpler and more familiarPython

The pattern: Synoema wins where machines generate code and correctness is critical. Synoema loses where humans write code or ecosystem size matters.

Architecture

~41,000 lines of Rust across 9 workspace crates (as of 2026-04-16):

CratePurposeLOC
synoema-lexerTokenization, offside rule (indentation → INDENT/DEDENT)1,984
synoema-parserPratt parser, 22 expression kinds, AST3,453
synoema-typesHindley-Milner inference, row polymorphism, contracts6,068
synoema-coreCore IR (System F), constant folding, dead code elimination, e-graph4,922
synoema-evalTree-walking interpreter, all builtins, I/O, networking7,325
synoema-codegenCranelift JIT compiler, tagged pointer ABI, arena memory, 130 FFI9,275
synoema-diagnosticStructured errors with LLM hints, fixability scores1,147
synoema-lspLSP server (hover, go-to-def, diagnostics, completion)633
synoema-replCLI: run, jit, eval, test, doc, watch, new, install, check, verify, setup, migrate6,181

Key architectural decisions:

Ecosystem

MCP Server

npx synoema-mcp — instant integration with Claude Desktop, Cursor, Zed. Tools: eval, typecheck, run, constrain (token masking), feedback_loop (generate→check→retry loop), doc_query (structured doc extraction).

VS Code Extension

Syntax highlighting, run/JIT keybindings (Cmd+Shift+R/J), eval selection. LSP server for hover types, go-to-def, diagnostics.

GBNF Grammar

162 lines, 48 rules. Works with llama.cpp, vLLM, SGLang, TensorRT-LLM. Ensures 100% syntactically valid LLM output.

Fine-Tuning Corpus

10,324 verified examples (99.9% pass rate) in ChatML format. Covers algorithms, data structures, pattern matching, error handling, I/O, and more. Used to fine-tune 7B models to 90.5% run rate.

Benchmark Suite

30 tasks across 5 languages. Automated token counting (tiktoken) and runtime measurement. Reproducible via Python scripts.

1,485 Tests

Unit tests, stress tests (fib(35), 100K tokens, deep nesting), corpus validation, adversarial edge cases. 0 failures, 0 warnings.

Roadmap

PhaseStatusDescription
Working LanguageDoneLexer, parser, type system, interpreter, REPL
Working CompilerDoneCore IR, Cranelift JIT, tagged pointer ABI, arena memory
LLM-NativeDoneGBNF grammar, MCP server, constrained decoding, LLM error feedback
Production90%Region inference, contract docs, benchmark suite, small model templates
CommunityNextPackage manager, expanded corpus, documentation

Current version: 0.1.0-alpha.3 (alpha — syntax and APIs may change)

Phase D: LLM Generation Benchmark Results

Measured April 2026 across 7 models, 9 standard tasks, via Ollama and OpenRouter. All five hypotheses resolved.

ModelSizeConfigSyntax%Run%
llama-3.2-1b1Bbaseline0%0%
llama-3.2-3b3Bbaseline44%0%
qwen2.5-coder:3b3Bbaseline87%60%
llama-3.1-8b8Bbaseline59%30%
qwen2.5-coder-7b7Bbaseline56%41%
qwen3-8b8Bmultipass59%48%
qwen2.5-coder:7b7Bbaseline36%12%

Key findings:

Source: docs/research/hypothesis-test-results.md

Powered by Synoema

This website — synoema.tech — is served by an HTTP server written entirely in Synoema. The server is 171 lines of code handling routing, static file serving, playground evaluation, and SEO — with no external frameworks or dependencies. Pure Synoema.

This is deliberate dogfooding: the language proves its own capabilities by running its own production website. The server demonstrates real-world functionality that goes beyond toy examples:

When we say Synoema is ready for real workloads, we mean it literally: you're looking at the proof.

v6 Fine-Tuning: Intermediate Results

After Phase D established baselines, we fine-tuned Qwen2.5-Coder models on corpus_v6 (10,324 compiler-validated examples, ChatML format, 99.9% parse+run correct). Results as of April 2026 — 3B and 1.5B evals in progress.

ModelCorpusSyntax%Run%Constructs%vs Baseline
qwen2.5-coder-7b (baseline)56%41%
synoema-coder-7b v6v6 ChatML100%90.5%44.6%+49.5pp run
synoema-coder-3b v6v6 ChatMLbenchmark eval in progress
synoema-coder-1.5b v6v6 ChatMLeval in progress

What improved: Run rate jumped from 41% to 90.5% (+49.5pp) — H6 confirmed. Syntax is now 100%.

What regressed: Constructs pass rate dropped from 52.7% (v5) to 44.6% — an 8.1pp regression. The model generates runnable code but sometimes avoids the specific constructs (|>, test, and_then) the task asked for. Under investigation.

Production status: v6 7B does not yet meet production criteria (constructs regression exceeds the −5pp tolerance). It is a research artifact with a known weakness.

Full analysis with failure breakdown →

Get Involved

Synoema is an open research project. Contributions welcome:

Get Started Language Reference Try Online