K-104 maps every query to one of 104 semantic rooms — Hearts, Spades, Diamonds, Clubs — and routes it to the cheapest model that can answer well. Blended cost: $0.003 per 1K tokens. That's 48× cheaper than GPT-4 Opus.
That's like hiring a surgeon to fill out paperwork. The work gets done, but you pay surgeon rates for every form.
Every query → GPT-4 Opus or Sonnet. Simple questions cost the same as complex ones. Blended cost: $0.15–$0.60 per 1K tokens. You're leaving 95% of your budget on the floor.
Queries get classified by semantic geometry first. Simple greetings hit the template cache (microseconds, free). Complex reasoning escalates to Claude. Blended cost: $0.003. Same quality, fraction of the price.
Inspired by playing cards. Four semantic domains × 13 intensity levels × light/dark polarity = 104 rooms. It's not metaphor — transformer activations cluster along exactly these axes.
Emotion, relationship, connection. "How are you feeling?" "I need support." "Tell me about love."
Mind, analysis, truth-seeking. "Debug this code." "What's the flaw in this argument?" "Explain quantum tunneling."
Material, building, grounding. "Write this function." "Fix this bug." "How do I center a div?"
Action, will, energy. "Let's start." "Run this." "Make it happen." Drive and momentum.
OpenAI-compatible API. Drop-in replacement for your existing AI calls.
from openclaw import KlawRouter router = KlawRouter(api_keys={"ANTHROPIC_API_KEY": "sk-ant-..."}) # Simple query → hits template cache, costs nothing result = router.route("hello") print(result["cost"]) # $0.0000 print(result["tier_name"]) # "template" # Technical query → routes to cheapest capable model result = router.route("explain transformer attention") print(result["cost"]) # $0.0008 print(result["savings"]) # $0.0147 saved vs Opus print(result["k_address"]) # "+7S" # Or use the OpenAI-compatible endpoint # POST http://localhost:8104/v1/chat/completions # (drop-in for any OpenAI client)
Versus paying top-of-tier for every query.
| Feature | K-104 (KLAW) | Single-model approach |
|---|---|---|
| Blended cost per 1K tokens | $0.003 | $0.15 – $0.60 |
| Simple query latency | <1ms (template cache) | 800ms – 3s (API roundtrip) |
| Runs fully offline | ✓ local model tier | ✗ always calls API |
| OpenAI-compatible API | ✓ drop-in | N/A |
| Semantic query awareness | ✓ 104 rooms | ✗ blind routing |
| BYOK (bring your own keys) | ✓ always available | varies |
K-104 is the substrate. These are the things that run on it.
We didn't assume K-104 would work in transformers — we verified it empirically using activation probes on hermes3:8b.
Transformer activations cluster by K-suit with silhouette score 0.312 — statistically significant. The four-suit geometry exists in learned representations, not just as a routing heuristic.
Light/dark polarity (constructive vs destructive query intent) is even more clearly separated in activation space. 86.2% variance explained by the K-coordinate axes.
Hermes centroid calibration: 100% accuracy on held-out test set after K-vector centroid matching. Zero misroutes on 600+ evaluated exchanges.
K-104 extended to K-143 with 39 additional rooms mapped to whale communication primitives (Project CETI). Body-level, survival, and deep relational semantics. Ancient firmware.
KLAW API is in private beta. Enter your email and we'll send you an API key when your spot opens. Free tier available. No credit card.
No spam. Just an API key when you're in. — Kit & K-Systems