MCP • RAG • IoT

The Synoema LLM Toolchain — complete reference

Three interconnected layers that make Synoema an LLM-native platform: the MCP server gives any AI agent 20+ tools for code evaluation and retrieval; RAG provides a local vector index over 5 corpora so models find idiomatic examples without guessing; and the IoT platform closes the loop from LLM prompt to compiled artifact on Raspberry Pi, STM32, or nRF5340.

MCP 2024-11-05 20+ tools RAG: 5 scopes jina-code-v2 IoT: 3 tiers 30 rules • 200 B mean WASM

MCP Server RAG IoT

MCP Server — eval, typecheck, run, dev intelligence, RAG tools, auto-inject, session
MCP Installation & Connection — npx, binary, Claude Desktop, Cursor
RAG — Retrieval-Augmented Generation — architecture, 5 scopes, ReAct, auto-inject
RAG Installation & Usage — sno rag install, status, update
IoT Platform — 3 tiers, WASM pipeline, 6 verticals
LLM → IoT Pipeline — cloud_compile.py, GBNF, cloud vs local model
How They Fit Together — the full integrated picture

MCP Server

The Synoema MCP server implements the Model Context Protocol (MCP 2024-11-05) over stdio. It integrates the Synoema compiler, evaluator, type checker, and RAG retrieval layer into any MCP-compatible client — Claude Desktop, Cursor, Zed, or a custom agent.

Why MCP is required for LLM agents. The stateless CLI (sno run) recompiles from scratch on every call (50–180 ms overhead) and has no access to session state, dev intelligence, or retrieval. The MCP server maintains a per-connection LRU-500 AST cache, 7 dev intelligence tools, 5 RAG retrieval tools, and a 50-turn transcript window across the session lifetime.

Core Language Tools

Tool	Input	Output
eval	Single Synoema expression, e.g. `[1..10] \|> sum`	Value + inferred type, or structured error JSON
typecheck	Full Synoema program (with `main`)	`main : Type` or structured error with `llm_hint`
run	Full Synoema program (with `main`)	stdout output + final value, or error

Error JSON shape — every error from eval, typecheck, and run follows a machine-readable schema that LLMs can parse and act on:

{
  "code": "unbound_variable",
  "severity": "error",
  "message": "Undefined variable: foo",
  "span": {"line": 4, "col": 8, "end_line": 4, "end_col": 11},
  "llm_hint": "Variable 'foo' is not defined. Did you mean 'bar'?",
  "fixability": "easy",
  "did_you_mean": "bar",
  "source_origin": "user"
}

source_origin distinguishes user code ("user"), imported modules ("import:<path>"), and prelude bugs ("prelude"). Every error carries an llm_hint — a sentence written for the model, not the human.

Dev Intelligence Tools

Seven tools expose a live index of the Synoema compiler source, powered by syn AST parsing. Line numbers and API surfaces are always current — no stale docs.

Tool	Input	Output (budget)
project_overview	—	Crate structure, LOC, test counts (≤300 tok)
crate_info	`crate_name`	Public API: functions, types, structs (≤500 tok)
file_summary	`file` path	Function list with signatures, no bodies (≤300 tok)
search_code	`query`, optional `scope`	Top-5 keyword matches with context (≤400 tok)
get_context_for_edit	`file`, `line`	Enclosing function + ±20 lines context (≤500 tok)
doc_query	`file` path	Structured docs: description, contracts (`requires`/`ensures`), examples (≤500 tok)
recipe	`task` description	Step-by-step recipe with current line numbers (≤500 tok)

All budgets are ≤500 tokens for compatibility with small context models (8K–32K). Available recipes: add_operator, add_builtin, add_type, fix_from_error.

RAG Retrieval Tools

Five tools perform semantic retrieval over the installed RAG index. See the RAG section for index details.

Tool	Scope	Default k / max k
search_corpus	Fine-tune training corpus (.sno + ChatML)	5 / 20
search_docs	Docs (LANGUAGE.md, guides, API)	5 / 20
search_skills	Bundled skills + installed packages	3 / 10
search_traces	LLM failure traces with repair examples	5 / 20
search_unified	All 5 scopes (filterable via `scopes` param)	10 / 30

All retrieval tools degrade gracefully when the RAG index is absent — they return a structured error and the server continues serving all other tools without restart.

Auto-Injection for Small Models

Models ≤7B often cannot reliably emit structured search_* tool-use actions, but still benefit from retrieval context. The MCP server can auto-inject a retrieval_context field into responses from typecheck, run, and feedback_loop — transparent to the model, no protocol changes needed.

Enable in ~/.sno/config.toml:

[rag.auto_inject]
enabled = true
scopes = ["traces", "corpus"]  # any of corpus, docs, skills, traces, sno
top_k = 3
max_chunk_chars = 800

# Per-tool override:
[rag.auto_inject.per_tool.typecheck]
top_k = 2
scopes = ["traces"]

When the tool response contains an error, the middleware appends:

{
  "error": "Type mismatch: expected Int, found String",
  "retrieval_context": {
    "query": "Type mismatch: expected Int, found String",
    "hits": [
      {"scope": "traces", "source": "trace/t42.json", "score": 0.82, "text": "..."},
      {"scope": "corpus", "source": "corpus/add.sno",  "score": 0.78, "text": "..."}
    ],
    "auto_injected": true,
    "top_k": 3
  }
}

The middleware silently skips injection if the RAG index is missing. Clients that ignore retrieval_context see no behavioral change.

Session & State Tools

Tool	Output
get_context	Phase-appropriate documentation: full LLM ref when writing code, error context when debugging (≤1800 tok)
get_state	Current dev phase + last 5 state transitions (JSON)
session_info	Session ULID, cache hit rate, tool call count, connection age
session_history	Last N tool calls with inputs and outputs (transcript window, max 50 turns)

Package Discovery Tools

Tool	What it does
search_packages	Search registry + installed packages by keyword. Returns install command and import snippet.
suggest_packages	Extract unknown identifiers from code and suggest packages that provide them.

Self-Report Tools (LLM → author feedback)

Four tools let the LLM record gaps, contradictions, or ambiguities it encounters. These feed the research/llm-failures/ channel — not for end users, but for improving the language and docs.

Tool	When to call
flag_doc_gap	Expected documentation on a topic but didn't find it
flag_doc_contradiction	A doc quote contradicts observed behavior
flag_ambiguity	Multiple valid interpretations; records which one was chosen
request_clarification	Task context is unclear; records the question

Telemetry is off by default. Enable local-only collection in ~/.sno/config.toml:

[telemetry]
llm_failures = "local-only"

MCP Installation & Connection

Install via npx (recommended)

# No installation required — downloads automatically
npx synoema-mcp

# Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "synoema": {
      "command": "npx",
      "args": ["synoema-mcp"]
    }
  }
}

Install via sno CLI (easiest after sno is installed)

sno mcp-install              # installs binary to ~/.sno/bin/
sno setup claude --binary    # writes Claude Desktop config
sno setup cursor --binary    # writes Cursor config

Verify connection

Open Claude Desktop and ask: "Use the eval tool to compute 2 + 3". If you see 5 : Int, MCP is connected. For Cursor, open the MCP panel and look for the synoema server in the tool list.

Connect to other clients

# Cursor — .cursor/mcp.json
{ "synoema": { "command": "synoema-mcp" } }

# Zed — settings.json
{
  "context_servers": {
    "synoema": { "command": { "path": "synoema-mcp", "args": [] } }
  }
}

# Manual test (stdio)
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{
  "protocolVersion":"2024-11-05","capabilities":{},
  "clientInfo":{"name":"test","version":"0"}}}' | synoema-mcp

Traffic logging

# Enable logging for one run
SYNOEMA_MCP_TRAFFIC=1 synoema-mcp

# Or in ~/.sno/config.toml
[logging]
enabled = true
level = "errors"   # all | errors | tools
dir = "~/.sno/mcp-traffic"

RAG — Retrieval-Augmented Generation

RAG gives LLMs a local knowledge base they can search before generating code. Synoema's RAG stack is Rust-native, offline-first, and opt-in. No Python, no external vector database. It ships as part of the sno CLI and the MCP server.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     OFFLINE (build phase)                        │
│                                                                  │
│   source tree          sno build-index          vector index     │
│   ┌───────────┐   ──────────────────────▶   ┌──────────────┐    │
│   │ corpus    │        5 chunkers            │ chunks.jsonl │    │
│   │ docs/     │        jina-code-v2          │ vectors.bin  │    │
│   │ skills/   │        int8 quantized        │ MANIFEST.json│    │
│   │ traces/   │                              └──────────────┘    │
│   │ .sno files│                                                  │
│   └───────────┘                                                  │
└──────────────────────────────────────────────────────────────────┘
                                   │
                           sno rag install
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────┐
│                    RUNTIME (MCP server)                          │
│                                                                  │
│   ┌────────────────────────────────────────────────────────┐    │
│   │ synoema-mcp  ──  search_corpus / search_docs /         │    │
│   │                  search_skills / search_traces /        │    │
│   │                  search_unified                         │    │
│   └─────────────────────┬─────────────────────┬────────────┘    │
│                         │                     │                  │
│          ┌──────────────▼──┐       ┌─────────▼──────────────┐  │
│          │  sno fix         │       │  auto_inject           │  │
│          │  --with-rag      │       │  middleware             │  │
│          │  (ReAct loop)    │       │  (transparent for       │  │
│          │  explicit search │       │  small models)          │  │
│          └──────────────────┘       └────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

Five Scopes

Scope	Source	Chunking strategy	Typical use
corpus	Fine-tune training data (`.sno` + ChatML pairs)	One chunk per JSONL record	Find idiomatic patterns for a function shape
docs	Language reference, guides, API docs	Split at H2 headings, capped at 2 KB	Answer "how does X work" questions
skills	Bundled skills + installed package SKILL.md files	Whole SKILL.md per chunk	Discover reusable patterns (concurrency, IoT, etc.)
traces	LLM failure traces with repair examples	One chunk per trace record	Find how a past error was fixed
sno	`.sno` source files in the repo	One chunk per top-level definition (parser AST)	Exact-match on existing definitions

Embedder

Default embedder: jina-code-v2 (568M parameters, 768-dim output, int8 quantized, ~160 MB on disk). It uses a custom tokenizer that preserves Synoema operators (|>, <>, :=, etc.) as single units, matching the language's BPE-aware surface.

Default builds ship a deterministic StubEmbedder so that cargo test --all stays hermetic on machines without the ONNX runtime. Production binaries include the real embedder behind --features=synoema-embed/inference.

Index format

Brute-force cosine similarity over the full vector set. On the current corpus (~13k chunks) a single query takes ~50 ms on a 2023 M2 laptop. HNSW is deferred until the corpus exceeds ~20k chunks or query rate exceeds 50 qps — the dependency cost is not justified yet.

Three retrieval modes

ReAct Agent

sno fix file.sno --with-rag — wires 4 retrieval actions into a Thought→Action→Observation loop. The model explicitly calls search_traces, search_corpus, etc. Every call is GBNF-gated and audited in ~/.sno/audit_retrieval.jsonl.

Auto-Inject

Transparent for small models (3B, 1.5B). When typecheck or run returns an error, the middleware appends a retrieval_context field with relevant hits. The model reads context without knowing RAG exists.

Raw MCP Tools

Power users and custom agents call search_corpus / search_unified directly via JSON-RPC. Useful for IDE integrations, pre-generation context loading, or custom ReAct frameworks.

ReAct retrieval trace example

Thought: I see a non-exhaustive case error. Let me search traces for similar fixes.
Action: search_traces
Arguments: {"query": "exhaustiveness case Nothing", "k": 3}
Observation: [
  {"source": "traces/t42.json", "score": 0.82, "text": "case opt of\n  Just x -> …\n  Nothing -> 0"}
]
Thought: Found the pattern. Adding the missing branch.
Action: apply_patch
Arguments: {"file": "broken.sno", "patch": "…"}

RAG Installation & Usage

Install

# Fastest: install Synoema with RAG in one command
curl -fsSL https://synoema.tech/install.sh | sh -s -- --with-rag

# PowerShell (Windows)
.\install.ps1 -WithRag

# Add RAG to an existing install
sno rag install

# Check what is installed
sno rag status
# → Installed: 2026-04-18, 12840 chunks across 5 scopes, ~179 MB

# Update when a new pack is published
sno rag update

# Remove everything
sno rag remove

sno rag install downloads the pack manifest, verifies SHA-256, extracts to ~/.sno/models/embed/<date>/, and flips an atomic symlink at ~/.sno/models/embed/current/ — in-flight tools never see a half-updated index.

Troubleshooting

Symptom	Cause	Fix
`sno rag status` says not installed	Pack never downloaded	`sno rag install`
`rag_model_unavailable` in MCP logs	ONNX feature missing or model not extracted	`sno rag remove && sno rag install`
Retrieval >500 ms/query	Running `StubEmbedder` instead of real model	Rebuild with `--features=synoema-embed/inference`
`[warning] MCP unavailable` in `sno fix --with-rag`	MCP server not reachable	Run `sno mcp-install` and check config

Run sno doctor for a single-command health report across all components including the RAG pack, embedder path, and index integrity.

IoT Platform

Synoema's IoT platform enables LLM-generated automation rules to run on embedded hardware — from bare Cortex-M MCUs to Raspberry Pi — through a 3-tier device model and a WASM-first compilation strategy.

Three Deployment Tiers

Tier 0: Bare MCU              Tier 1: RTOS / wasm3            Tier 2: Linux Edge
──────────────────            ────────────────────────        ──────────────────
Cortex-M0/M3, <64 KB         ESP32, STM32, ≥128 KB           RPi, aarch64 SBC

sno wasm → .wasm              sno wasm → .wasm                sno build --native
     │                             │                               │
C host + wasm3 embed          wasm3 on-device                 Cranelift ObjectModule
integer-only rules            floats + contracts              full language

aot_thumbv7m.rs (partial)    wasm_codegen.rs v2+v3           aot_aarch64.rs
→ blocked: Cranelift ARM32    + wasm_runtime.rs               → shipped

Target hardware:              Target hardware:                Target hardware:
RP2040, STM32F0/F1            ESP32, STM32F4/H7               Raspberry Pi (all)
nRF5340 (Zephyr)              nRF5340, Arduino                Jetson, BeagleBone
<64 KiB flash                 ≥128 KiB flash                  x86/aarch64 Linux

WASM v3 Feature Matrix

Feature	v2	v3 records/ADT	v3 floats	v3 contracts	Deferred
Integer arithmetic	✓	✓	✓	✓	—
Strings, Lists, Closures	✓	✓	✓	✓	Nil/Cons deep patterns
Perceus reference counting	✓	✓	✓	✓	—
Records & ADT constructors	—	✓	✓	✓	Pattern nesting >1 level
Floats + math builtins	—	—	✓	✓	—
`requires`/`ensures` contracts	—	—	—	✓	—
Host imports (GPIO, I2C, SPI)	—	—	—	—	`wasm-host-imports`

Contracts at compile time

requires and ensures clauses compile to WebAssembly unreachable traps — violations are caught at runtime before they propagate. This mirrors the JIT contract enforcement via Cranelift trapif:

-- Synoema rule with safety contracts
rule_overpressure : Int -> Bool
  requires pressure > 0
  ensures result == (pressure > 800)
rule_overpressure pressure = pressure > 800

; WASM output (simplified)
;; requires check: trap if pressure <= 0
i64.const 0
call $__int_unbox
i64.gt_s
i64.eqz
if unreachable end

;; function body…
;; ensures check: trap if result != (pressure > 800)
…

Six Vertical MVPs

Vertical	Wave	Platform	Rules	sno check	sno wasm	Mean B
Home automation	1	RPi aarch64	5	5/5	5/5	74.2
Industrial safety	1	STM32	5	5/5	4/5	87.4
Wearable health	1	nRF5340	5	5/5	5/5	84.6
Automotive	2	STM32	5	5/5	5/5	~110
Agriculture	2	ESP32/RPi	5	5/5	5/5	~98
Healthcare	2	nRF5340/RPi	5	5/5	5/5	~114

Wave 2 aggregate: 30/30 check (100%) • 29/30 WASM (96.7%) • mean 200 B. Known failure: rule_flow_min_alarm — Nil list pattern not yet registered in WASM v3 ctor_tags; deferred to wasm-host-imports.

Honest deferrals. Native Thumb2 AOT (aot_thumbv7m.rs) is scaffolded but blocked on the Cranelift ARM32 backend. GPIO host imports for WASM (gpio_* builtins inside .wasm) are deferred to the wasm-host-imports change. Real wasm3-on-MCU CI and 5 Wave-3 verticals (automotive/medical/agriculture/logistics/smart-grid) are future work.

LLM → IoT Pipeline

The complete pipeline from natural language prompt to WASM artifact on hardware:

Natural language prompt
       │
       ▼ LLM (constrained by synoema-iot-rules.gbnf)
Synoema IoT rule (.sno)
       │
       ├── sno check  →  parse + typecheck  →  PASS / FAIL + structured errors
       │
       └── sno wasm   →  WASM v3 codegen   →  .wasm artifact
                           │
                           ├── Tier 0: C host + wasm3 embed  (MCU)
                           ├── Tier 1: on-device wasm3       (RTOS/ESP32)
                           └── sno build --native            (Linux edge ELF)
                               --target aarch64-linux

GBNF-constrained generation

The LLM generates IoT rules constrained by lang/tools/constrained/synoema-iot-rules.gbnf — a GBNF grammar that limits output to valid Synoema rule syntax. This eliminates hallucinated syntax and reduces parse failures on first generation.

# Rule example generated within the GBNF constraint
rule_fan_control : Int -> Bool
  requires temp > -50
  ensures result == (temp > 30)
rule_fan_control temp = temp > 30

Two LLM backends

Backend	Flag	Use case
Anthropic API (claude-opus-4-7)	default	Development, CI, highest quality
Local fine-tuned (Qwen2.5-Coder-3B via Ollama)	`--model ollama:iot-rules-3b`	Offline, air-gapped, low-latency
Mock (CI)	`--mock`	Hermetic tests, no API key required

Reference pipeline: cloud_compile.py

# Cloud path (requires ANTHROPIC_API_KEY)
python3 lang/tools/llm/cloud_compile.py \
  --prompt "turn on fan when temperature exceeds 30°C" \
  --target rpi

# Mock mode — fully hermetic, no API key
python3 lang/tools/llm/cloud_compile.py \
  --mock \
  --prompt "turn on fan when temp > 30" \
  --target rpi

# Industrial rules with contracts
python3 lang/tools/llm/cloud_compile.py \
  --prompt "shut off pump when pressure > 800 PSI" \
  --vertical industrial \
  --target stm32

Wave-2 Training Corpus

1,177 unique (prompt, rule) pairs across 5 verticals × 7 rule kinds × 3 difficulty tiers:

Rule kinds:   threshold / hysteresis / counter / timer /
              interlock / safety / pattern-match
Verticals:    home / industrial / wearable / automotive / agriculture
Difficulty:   simple / medium / hard
Splits:       946 train / 104 val / 127 test  (hash-mod-10 deterministic)
Token stats:  mean 89.5 / median 87 / p95 139 (cl100k_base)

Training script: research/finetune/train_iot_rules_small.py (Qwen2.5-Coder-3B, QLoRA 4-bit, AMD RX 7900 GRE + unsloth + ROCm).

How MCP + RAG + IoT Fit Together

The three layers are independent but designed to compose:

Claude Desktop / Cursor / custom agent
           │
           │ MCP 2024-11-05 (stdio JSON-RPC)
           │
    ┌──────▼──────────────────────────────────────────────┐
    │               synoema-mcp                           │
    │                                                      │
    │  eval / typecheck / run / feedback_loop             │
    │       │                │                             │
    │       │  auto_inject   │  search_* tools             │
    │       │  middleware     │  (when agent calls them)   │
    │       │       │         │                             │
    │       │       └────┬────┘                             │
    │       │            │                                  │
    │       │     ┌──────▼──────────────────┐              │
    │       │     │  RAG index              │              │
    │       │     │  ~/.sno/models/embed/   │              │
    │       │     │  jina-code-v2           │              │
    │       │     │  5 scopes, ~13k chunks  │              │
    │       │     └─────────────────────────┘              │
    └───────┼──────────────────────────────────────────────┘
            │
            │  sno check / sno wasm / sno build --native
            │
    ┌───────▼────────────────────────────────────────────┐
    │          IoT Compilation Pipeline                   │
    │                                                     │
    │  .sno rule  →  WASM v3 codegen  →  .wasm artifact │
    │                                        │            │
    │                             ┌──────────┴──┐         │
    │                        Tier 0        Tier 1/2       │
    │                        wasm3 MCU     Linux ELF      │
    └────────────────────────────────────────────────────┘

Typical LLM agent workflow

Agent opens session — MCP server creates a ULID session, warms the AST cache.
Agent calls get_context — receives phase-appropriate reference docs (≤1800 tok).
Agent generates a rule — calls typecheck with the draft. If there is an error, the auto-inject middleware attaches relevant search_traces hits automatically.
Agent calls run — executes the rule to verify output. If it fails again, the ReAct loop (sno fix --with-rag) activates and searches the corpus for idiomatic fixes.
Agent compiles to WASM — calls sno wasm rule.sno via the run tool or the CLI. Gets a .wasm artifact ready for the target tier.
Agent deploys — via cloud_compile.py for the full pipeline, or manually for custom hardware.

Key measured numbers

MCP

LRU-500 AST cache • 50-turn transcript • 20+ tools • 50–180 ms faster per call vs CLI

RAG

~13k chunks • 5 scopes • ~50 ms / query • jina-code-v2 int8 (~160 MB)

IoT

30/30 sno check • 29/30 WASM • 200 B mean artifact • 6 verticals • 3 tiers