nmnhut.dev · interactive demo

Token Router / self-learning

Type a request and watch it get routed through three tiers: memory cache, semantic embeddings (ONNX), and small-LLM fallback. The router learns from every query.

queries 0 memory hits 0% learned 0 avg latency — model loading…

Loading embedding model…

keyword: semantic:

Keywords

exact substring match

Memory

cached request → route lookup

Embeddings

ONNX cosine similarity

LLM Fallback

OpenAI-compatible classifier

No confident match — pick the right route

Embedding scores were below threshold and the LLM classifier is unavailable. Your selection teaches the router for next time.

confidence: latency:

Any OpenAI-compatible endpoint (OpenAI, Zing Play, LM Studio, etc). Credentials from LLM Farm are auto-loaded if available.

API Endpoint

API Key

Model