MCP • RAG • IoT

The Synoema LLM Toolchain — complete reference

Three interconnected layers that make Synoema an LLM-native platform: the MCP server gives any AI agent 20+ tools for code evaluation and retrieval; RAG provides a local vector index over 5 corpora so models find idiomatic examples without guessing; and the IoT platform closes the loop from LLM prompt to compiled artifact on Raspberry Pi, STM32, or nRF5340.

MCP 2024-11-05 20+ tools RAG: 5 scopes jina-code-v2 IoT: 3 tiers 30 rules • 200 B mean WASM

MCP Server RAG IoT

On this page

  1. MCP Server — eval, typecheck, run, dev intelligence, RAG tools, auto-inject, session
  2. MCP Installation & Connection — npx, binary, Claude Desktop, Cursor
  3. RAG — Retrieval-Augmented Generation — architecture, 5 scopes, ReAct, auto-inject
  4. RAG Installation & Usage — sno rag install, status, update
  5. IoT Platform — 3 tiers, WASM pipeline, 6 verticals
  6. LLM → IoT Pipeline — cloud_compile.py, GBNF, cloud vs local model
  7. How They Fit Together — the full integrated picture

MCP Server

The Synoema MCP server implements the Model Context Protocol (MCP 2024-11-05) over stdio. It integrates the Synoema compiler, evaluator, type checker, and RAG retrieval layer into any MCP-compatible client — Claude Desktop, Cursor, Zed, or a custom agent.

Why MCP is required for LLM agents. The stateless CLI (sno run) recompiles from scratch on every call (50–180 ms overhead) and has no access to session state, dev intelligence, or retrieval. The MCP server maintains a per-connection LRU-500 AST cache, 7 dev intelligence tools, 5 RAG retrieval tools, and a 50-turn transcript window across the session lifetime.

Core Language Tools

ToolInputOutput
evalSingle Synoema expression, e.g. [1..10] |> sumValue + inferred type, or structured error JSON
typecheckFull Synoema program (with main)main : Type or structured error with llm_hint
runFull Synoema program (with main)stdout output + final value, or error

Error JSON shape — every error from eval, typecheck, and run follows a machine-readable schema that LLMs can parse and act on:

{
  "code": "unbound_variable",
  "severity": "error",
  "message": "Undefined variable: foo",
  "span": {"line": 4, "col": 8, "end_line": 4, "end_col": 11},
  "llm_hint": "Variable 'foo' is not defined. Did you mean 'bar'?",
  "fixability": "easy",
  "did_you_mean": "bar",
  "source_origin": "user"
}

source_origin distinguishes user code ("user"), imported modules ("import:<path>"), and prelude bugs ("prelude"). Every error carries an llm_hint — a sentence written for the model, not the human.

Dev Intelligence Tools

Seven tools expose a live index of the Synoema compiler source, powered by syn AST parsing. Line numbers and API surfaces are always current — no stale docs.

ToolInputOutput (budget)
project_overviewCrate structure, LOC, test counts (≤300 tok)
crate_infocrate_namePublic API: functions, types, structs (≤500 tok)
file_summaryfile pathFunction list with signatures, no bodies (≤300 tok)
search_codequery, optional scopeTop-5 keyword matches with context (≤400 tok)
get_context_for_editfile, lineEnclosing function + ±20 lines context (≤500 tok)
doc_queryfile pathStructured docs: description, contracts (requires/ensures), examples (≤500 tok)
recipetask descriptionStep-by-step recipe with current line numbers (≤500 tok)

All budgets are ≤500 tokens for compatibility with small context models (8K–32K). Available recipes: add_operator, add_builtin, add_type, fix_from_error.

RAG Retrieval Tools

Five tools perform semantic retrieval over the installed RAG index. See the RAG section for index details.

ToolScopeDefault k / max k
search_corpusFine-tune training corpus (.sno + ChatML)5 / 20
search_docsDocs (LANGUAGE.md, guides, API)5 / 20
search_skillsBundled skills + installed packages3 / 10
search_tracesLLM failure traces with repair examples5 / 20
search_unifiedAll 5 scopes (filterable via scopes param)10 / 30

All retrieval tools degrade gracefully when the RAG index is absent — they return a structured error and the server continues serving all other tools without restart.

Auto-Injection for Small Models

Models ≤7B often cannot reliably emit structured search_* tool-use actions, but still benefit from retrieval context. The MCP server can auto-inject a retrieval_context field into responses from typecheck, run, and feedback_loop — transparent to the model, no protocol changes needed.

Enable in ~/.sno/config.toml:

[rag.auto_inject]
enabled = true
scopes = ["traces", "corpus"]  # any of corpus, docs, skills, traces, sno
top_k = 3
max_chunk_chars = 800

# Per-tool override:
[rag.auto_inject.per_tool.typecheck]
top_k = 2
scopes = ["traces"]

When the tool response contains an error, the middleware appends:

{
  "error": "Type mismatch: expected Int, found String",
  "retrieval_context": {
    "query": "Type mismatch: expected Int, found String",
    "hits": [
      {"scope": "traces", "source": "trace/t42.json", "score": 0.82, "text": "..."},
      {"scope": "corpus", "source": "corpus/add.sno",  "score": 0.78, "text": "..."}
    ],
    "auto_injected": true,
    "top_k": 3
  }
}

The middleware silently skips injection if the RAG index is missing. Clients that ignore retrieval_context see no behavioral change.

Session & State Tools

ToolOutput
get_contextPhase-appropriate documentation: full LLM ref when writing code, error context when debugging (≤1800 tok)
get_stateCurrent dev phase + last 5 state transitions (JSON)
session_infoSession ULID, cache hit rate, tool call count, connection age
session_historyLast N tool calls with inputs and outputs (transcript window, max 50 turns)

Package Discovery Tools

ToolWhat it does
search_packagesSearch registry + installed packages by keyword. Returns install command and import snippet.
suggest_packagesExtract unknown identifiers from code and suggest packages that provide them.

Self-Report Tools (LLM → author feedback)

Four tools let the LLM record gaps, contradictions, or ambiguities it encounters. These feed the research/llm-failures/ channel — not for end users, but for improving the language and docs.

ToolWhen to call
flag_doc_gapExpected documentation on a topic but didn't find it
flag_doc_contradictionA doc quote contradicts observed behavior
flag_ambiguityMultiple valid interpretations; records which one was chosen
request_clarificationTask context is unclear; records the question

Telemetry is off by default. Enable local-only collection in ~/.sno/config.toml:

[telemetry]
llm_failures = "local-only"

MCP Installation & Connection

Install via npx (recommended)

# No installation required — downloads automatically
npx synoema-mcp
# Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "synoema": {
      "command": "npx",
      "args": ["synoema-mcp"]
    }
  }
}

Install via sno CLI (easiest after sno is installed)

sno mcp-install              # installs binary to ~/.sno/bin/
sno setup claude --binary    # writes Claude Desktop config
sno setup cursor --binary    # writes Cursor config

Verify connection

Open Claude Desktop and ask: "Use the eval tool to compute 2 + 3". If you see 5 : Int, MCP is connected. For Cursor, open the MCP panel and look for the synoema server in the tool list.

Connect to other clients

# Cursor — .cursor/mcp.json
{ "synoema": { "command": "synoema-mcp" } }

# Zed — settings.json
{
  "context_servers": {
    "synoema": { "command": { "path": "synoema-mcp", "args": [] } }
  }
}

# Manual test (stdio)
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{
  "protocolVersion":"2024-11-05","capabilities":{},
  "clientInfo":{"name":"test","version":"0"}}}' | synoema-mcp

Traffic logging

# Enable logging for one run
SYNOEMA_MCP_TRAFFIC=1 synoema-mcp

# Or in ~/.sno/config.toml
[logging]
enabled = true
level = "errors"   # all | errors | tools
dir = "~/.sno/mcp-traffic"

RAG — Retrieval-Augmented Generation

RAG gives LLMs a local knowledge base they can search before generating code. Synoema's RAG stack is Rust-native, offline-first, and opt-in. No Python, no external vector database. It ships as part of the sno CLI and the MCP server.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     OFFLINE (build phase)                        │
│                                                                  │
│   source tree          sno build-index          vector index     │
│   ┌───────────┐   ──────────────────────▶   ┌──────────────┐    │
│   │ corpus    │        5 chunkers            │ chunks.jsonl │    │
│   │ docs/     │        jina-code-v2          │ vectors.bin  │    │
│   │ skills/   │        int8 quantized        │ MANIFEST.json│    │
│   │ traces/   │                              └──────────────┘    │
│   │ .sno files│                                                  │
│   └───────────┘                                                  │
└──────────────────────────────────────────────────────────────────┘
                                   │
                           sno rag install
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────────────┐
│                    RUNTIME (MCP server)                          │
│                                                                  │
│   ┌────────────────────────────────────────────────────────┐    │
│   │ synoema-mcp  ──  search_corpus / search_docs /         │    │
│   │                  search_skills / search_traces /        │    │
│   │                  search_unified                         │    │
│   └─────────────────────┬─────────────────────┬────────────┘    │
│                         │                     │                  │
│          ┌──────────────▼──┐       ┌─────────▼──────────────┐  │
│          │  sno fix         │       │  auto_inject           │  │
│          │  --with-rag      │       │  middleware             │  │
│          │  (ReAct loop)    │       │  (transparent for       │  │
│          │  explicit search │       │  small models)          │  │
│          └──────────────────┘       └────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

Five Scopes

ScopeSourceChunking strategyTypical use
corpusFine-tune training data (.sno + ChatML pairs)One chunk per JSONL recordFind idiomatic patterns for a function shape
docsLanguage reference, guides, API docsSplit at H2 headings, capped at 2 KBAnswer "how does X work" questions
skillsBundled skills + installed package SKILL.md filesWhole SKILL.md per chunkDiscover reusable patterns (concurrency, IoT, etc.)
tracesLLM failure traces with repair examplesOne chunk per trace recordFind how a past error was fixed
sno.sno source files in the repoOne chunk per top-level definition (parser AST)Exact-match on existing definitions

Embedder

Default embedder: jina-code-v2 (568M parameters, 768-dim output, int8 quantized, ~160 MB on disk). It uses a custom tokenizer that preserves Synoema operators (|>, <>, :=, etc.) as single units, matching the language's BPE-aware surface.

Default builds ship a deterministic StubEmbedder so that cargo test --all stays hermetic on machines without the ONNX runtime. Production binaries include the real embedder behind --features=synoema-embed/inference.

Index format

Brute-force cosine similarity over the full vector set. On the current corpus (~13k chunks) a single query takes ~50 ms on a 2023 M2 laptop. HNSW is deferred until the corpus exceeds ~20k chunks or query rate exceeds 50 qps — the dependency cost is not justified yet.

Three retrieval modes

ReAct Agent

sno fix file.sno --with-rag — wires 4 retrieval actions into a Thought→Action→Observation loop. The model explicitly calls search_traces, search_corpus, etc. Every call is GBNF-gated and audited in ~/.sno/audit_retrieval.jsonl.

Auto-Inject

Transparent for small models (3B, 1.5B). When typecheck or run returns an error, the middleware appends a retrieval_context field with relevant hits. The model reads context without knowing RAG exists.

Raw MCP Tools

Power users and custom agents call search_corpus / search_unified directly via JSON-RPC. Useful for IDE integrations, pre-generation context loading, or custom ReAct frameworks.

ReAct retrieval trace example

Thought: I see a non-exhaustive case error. Let me search traces for similar fixes.
Action: search_traces
Arguments: {"query": "exhaustiveness case Nothing", "k": 3}
Observation: [
  {"source": "traces/t42.json", "score": 0.82, "text": "case opt of\n  Just x -> …\n  Nothing -> 0"}
]
Thought: Found the pattern. Adding the missing branch.
Action: apply_patch
Arguments: {"file": "broken.sno", "patch": "…"}

RAG Installation & Usage

Install

# Fastest: install Synoema with RAG in one command
curl -fsSL https://synoema.tech/install.sh | sh -s -- --with-rag

# PowerShell (Windows)
.\install.ps1 -WithRag

# Add RAG to an existing install
sno rag install

# Check what is installed
sno rag status
# → Installed: 2026-04-18, 12840 chunks across 5 scopes, ~179 MB

# Update when a new pack is published
sno rag update

# Remove everything
sno rag remove

sno rag install downloads the pack manifest, verifies SHA-256, extracts to ~/.sno/models/embed/<date>/, and flips an atomic symlink at ~/.sno/models/embed/current/ — in-flight tools never see a half-updated index.

Troubleshooting

SymptomCauseFix
sno rag status says not installedPack never downloadedsno rag install
rag_model_unavailable in MCP logsONNX feature missing or model not extractedsno rag remove && sno rag install
Retrieval >500 ms/queryRunning StubEmbedder instead of real modelRebuild with --features=synoema-embed/inference
[warning] MCP unavailable in sno fix --with-ragMCP server not reachableRun sno mcp-install and check config

Run sno doctor for a single-command health report across all components including the RAG pack, embedder path, and index integrity.


IoT Platform

Synoema's IoT platform enables LLM-generated automation rules to run on embedded hardware — from bare Cortex-M MCUs to Raspberry Pi — through a 3-tier device model and a WASM-first compilation strategy.

Three Deployment Tiers

Tier 0: Bare MCU              Tier 1: RTOS / wasm3            Tier 2: Linux Edge
──────────────────            ────────────────────────        ──────────────────
Cortex-M0/M3, <64 KB         ESP32, STM32, ≥128 KB           RPi, aarch64 SBC

sno wasm → .wasm              sno wasm → .wasm                sno build --native
     │                             │                               │
C host + wasm3 embed          wasm3 on-device                 Cranelift ObjectModule
integer-only rules            floats + contracts              full language

aot_thumbv7m.rs (partial)    wasm_codegen.rs v2+v3           aot_aarch64.rs
→ blocked: Cranelift ARM32    + wasm_runtime.rs               → shipped

Target hardware:              Target hardware:                Target hardware:
RP2040, STM32F0/F1            ESP32, STM32F4/H7               Raspberry Pi (all)
nRF5340 (Zephyr)              nRF5340, Arduino                Jetson, BeagleBone
<64 KiB flash                 ≥128 KiB flash                  x86/aarch64 Linux

WASM v3 Feature Matrix

Featurev2v3 records/ADTv3 floatsv3 contractsDeferred
Integer arithmetic
Strings, Lists, ClosuresNil/Cons deep patterns
Perceus reference counting
Records & ADT constructorsPattern nesting >1 level
Floats + math builtins
requires/ensures contracts
Host imports (GPIO, I2C, SPI)wasm-host-imports

Contracts at compile time

requires and ensures clauses compile to WebAssembly unreachable traps — violations are caught at runtime before they propagate. This mirrors the JIT contract enforcement via Cranelift trapif:

-- Synoema rule with safety contracts
rule_overpressure : Int -> Bool
  requires pressure > 0
  ensures result == (pressure > 800)
rule_overpressure pressure = pressure > 800
; WASM output (simplified)
;; requires check: trap if pressure <= 0
i64.const 0
call $__int_unbox
i64.gt_s
i64.eqz
if unreachable end

;; function body…
;; ensures check: trap if result != (pressure > 800)
…

Six Vertical MVPs

VerticalWavePlatformRulessno checksno wasmMean B
Home automation1RPi aarch6455/55/574.2
Industrial safety1STM3255/54/587.4
Wearable health1nRF534055/55/584.6
Automotive2STM3255/55/5~110
Agriculture2ESP32/RPi55/55/5~98
Healthcare2nRF5340/RPi55/55/5~114

Wave 2 aggregate: 30/30 check (100%) • 29/30 WASM (96.7%) • mean 200 B. Known failure: rule_flow_min_alarm — Nil list pattern not yet registered in WASM v3 ctor_tags; deferred to wasm-host-imports.

Honest deferrals. Native Thumb2 AOT (aot_thumbv7m.rs) is scaffolded but blocked on the Cranelift ARM32 backend. GPIO host imports for WASM (gpio_* builtins inside .wasm) are deferred to the wasm-host-imports change. Real wasm3-on-MCU CI and 5 Wave-3 verticals (automotive/medical/agriculture/logistics/smart-grid) are future work.

LLM → IoT Pipeline

The complete pipeline from natural language prompt to WASM artifact on hardware:

Natural language prompt
       │
       ▼ LLM (constrained by synoema-iot-rules.gbnf)
Synoema IoT rule (.sno)
       │
       ├── sno check  →  parse + typecheck  →  PASS / FAIL + structured errors
       │
       └── sno wasm   →  WASM v3 codegen   →  .wasm artifact
                           │
                           ├── Tier 0: C host + wasm3 embed  (MCU)
                           ├── Tier 1: on-device wasm3       (RTOS/ESP32)
                           └── sno build --native            (Linux edge ELF)
                               --target aarch64-linux

GBNF-constrained generation

The LLM generates IoT rules constrained by lang/tools/constrained/synoema-iot-rules.gbnf — a GBNF grammar that limits output to valid Synoema rule syntax. This eliminates hallucinated syntax and reduces parse failures on first generation.

# Rule example generated within the GBNF constraint
rule_fan_control : Int -> Bool
  requires temp > -50
  ensures result == (temp > 30)
rule_fan_control temp = temp > 30

Two LLM backends

BackendFlagUse case
Anthropic API (claude-opus-4-7)defaultDevelopment, CI, highest quality
Local fine-tuned (Qwen2.5-Coder-3B via Ollama)--model ollama:iot-rules-3bOffline, air-gapped, low-latency
Mock (CI)--mockHermetic tests, no API key required

Reference pipeline: cloud_compile.py

# Cloud path (requires ANTHROPIC_API_KEY)
python3 lang/tools/llm/cloud_compile.py \
  --prompt "turn on fan when temperature exceeds 30°C" \
  --target rpi

# Mock mode — fully hermetic, no API key
python3 lang/tools/llm/cloud_compile.py \
  --mock \
  --prompt "turn on fan when temp > 30" \
  --target rpi

# Industrial rules with contracts
python3 lang/tools/llm/cloud_compile.py \
  --prompt "shut off pump when pressure > 800 PSI" \
  --vertical industrial \
  --target stm32

Wave-2 Training Corpus

1,177 unique (prompt, rule) pairs across 5 verticals × 7 rule kinds × 3 difficulty tiers:

Rule kinds:   threshold / hysteresis / counter / timer /
              interlock / safety / pattern-match
Verticals:    home / industrial / wearable / automotive / agriculture
Difficulty:   simple / medium / hard
Splits:       946 train / 104 val / 127 test  (hash-mod-10 deterministic)
Token stats:  mean 89.5 / median 87 / p95 139 (cl100k_base)

Training script: research/finetune/train_iot_rules_small.py (Qwen2.5-Coder-3B, QLoRA 4-bit, AMD RX 7900 GRE + unsloth + ROCm).


How MCP + RAG + IoT Fit Together

The three layers are independent but designed to compose:

Claude Desktop / Cursor / custom agent
           │
           │ MCP 2024-11-05 (stdio JSON-RPC)
           │
    ┌──────▼──────────────────────────────────────────────┐
    │               synoema-mcp                           │
    │                                                      │
    │  eval / typecheck / run / feedback_loop             │
    │       │                │                             │
    │       │  auto_inject   │  search_* tools             │
    │       │  middleware     │  (when agent calls them)   │
    │       │       │         │                             │
    │       │       └────┬────┘                             │
    │       │            │                                  │
    │       │     ┌──────▼──────────────────┐              │
    │       │     │  RAG index              │              │
    │       │     │  ~/.sno/models/embed/   │              │
    │       │     │  jina-code-v2           │              │
    │       │     │  5 scopes, ~13k chunks  │              │
    │       │     └─────────────────────────┘              │
    └───────┼──────────────────────────────────────────────┘
            │
            │  sno check / sno wasm / sno build --native
            │
    ┌───────▼────────────────────────────────────────────┐
    │          IoT Compilation Pipeline                   │
    │                                                     │
    │  .sno rule  →  WASM v3 codegen  →  .wasm artifact │
    │                                        │            │
    │                             ┌──────────┴──┐         │
    │                        Tier 0        Tier 1/2       │
    │                        wasm3 MCU     Linux ELF      │
    └────────────────────────────────────────────────────┘

Typical LLM agent workflow

  1. Agent opens session — MCP server creates a ULID session, warms the AST cache.
  2. Agent calls get_context — receives phase-appropriate reference docs (≤1800 tok).
  3. Agent generates a rule — calls typecheck with the draft. If there is an error, the auto-inject middleware attaches relevant search_traces hits automatically.
  4. Agent calls run — executes the rule to verify output. If it fails again, the ReAct loop (sno fix --with-rag) activates and searches the corpus for idiomatic fixes.
  5. Agent compiles to WASM — calls sno wasm rule.sno via the run tool or the CLI. Gets a .wasm artifact ready for the target tier.
  6. Agent deploys — via cloud_compile.py for the full pipeline, or manually for custom hardware.

Key measured numbers

MCP

LRU-500 AST cache • 50-turn transcript • 20+ tools • 50–180 ms faster per call vs CLI

RAG

~13k chunks • 5 scopes • ~50 ms / query • jina-code-v2 int8 (~160 MB)

IoT

30/30 sno check • 29/30 WASM • 200 B mean artifact • 6 verticals • 3 tiers

Further reading