Type a request and watch it get routed through three tiers: memory cache, semantic embeddings (ONNX), and small-LLM fallback. The router learns from every query.
Embedding scores were below threshold and the LLM classifier is unavailable. Your selection teaches the router for next time.
Any OpenAI-compatible endpoint (OpenAI, Zing Play, LM Studio, etc). Credentials from LLM Farm are auto-loaded if available.