Build a Token Router with Embeddings and Prompt Templates
Skip the training pipeline and the GPU — embeddings, cosine similarity, and structured prompts are enough to cut your LLM bill by 80%. The idea Every query has a shape — topic, complexity, expected output format. You can detect that shape in <5ms using embeddings, then: Pick a prompt template — pre-built system prompt with format constraints, cached by the provider Pick a model — cheap for easy queries, strong for hard ones Cap output tokens — templates define expected length All of this works with pure geometry in embedding space — no model training, no preference data required. ...