nmnhut.dev · interactive demo

Token Router / self-learning

Type a request and watch it get routed through three tiers: memory cache, semantic embeddings (ONNX), and small-LLM fallback. The router learns from every query.

queries 0 memory hits 0% learned 0 avg latency model loading…
Loading embedding model…
keyword: semantic:
Keywords
exact substring match
Memory
cached request → route lookup
Embeddings
ONNX cosine similarity
LLM Fallback
OpenAI-compatible classifier

No confident match — pick the right route

Embedding scores were below threshold and the LLM classifier is unavailable. Your selection teaches the router for next time.

confidence: latency:

Any OpenAI-compatible endpoint (OpenAI, Zing Play, LM Studio, etc). Credentials from LLM Farm are auto-loaded if available.

API Endpoint
API Key
Model