n8nflow.net logo

AI Orchestrator: dynamically Selects Models Based on Input Type

by DavideUpdated: Last update a month agoSource: n8n.io
Loading workflow viewer...

Getting Started

This workflow is designed to intelligently route user queries to the most suitable large language model (LLM) based on the type of request received in a chat environment. It uses structured classification and model selection to optimize both performance and cost-efficiency in AI-driven conversations.

It dynamically routes requests to specialized AI models based on content type, optimizing response quality and efficiency.


Benefits

  • Smart Model Routing : Reduces costs by using lighter models for general tasks and reserving heavier models for complex needs.
  • Scalability : Easily expandable by adding more request types or LLMs.
  • Maintainability : Clear logic separation between classification, model routing, and execution.
  • Personalization : Can be integrated with session IDs for per-user memory, enabling personalized conversations.
  • Speed Optimization : Fast models like GPT-4.1 mini or Gemini Flash are chosen for tasks where speed is a priority.

How It Works

  1. Input Handling :

    • The workflow starts with the "When chat message received" node, which triggers the process when a chat message is received. The input includes the chat message (chatInput) and a session ID (sessionId).
  2. Request Classification :

    • The "Request Type" node uses an OpenAI model (gpt-4.1-mini) to classify the incoming request into one of four categories:
      • general: For general queries.
      • reasoning: For reasoning-based questions.
      • coding: For code-related requests.
      • search: For queries requiring search tools.
    • The classification is structured using the "Structured Output Parser" node, which enforces a consistent output format.
  3. Model Selection :

    • The "Model Selector" node routes the request to one of four AI models based on the classification:
      • Opus 4 (Claude 4 Sonnet): Used for coding requests.
      • Gemini Thinking Pro : Used for reasoning requests.
      • GPT 4.1 mini : Used for general requests.
      • Perplexity : Used for search (Google-related) requests.
  4. AI Processing :

    • The selected model processes the request via the "AI Agent" node, which includes intermediate steps for complex tasks.
    • The "Simple Memory" node retains session context using the provided sessionId, enabling multi-turn conversations.
  5. Output :

    • The final response is generated by the chosen model and returned to the user.

Set Up Steps

  1. Configure Trigger :

    • Ensure the "When chat message received" node is set up with the correct webhook ID to receive chat inputs.
  2. Define Classification Logic :

    • Adjust the prompt in the "Request Type" node to refine classification accuracy.
    • Verify the output schema in the "Structured Output Parser" node matches expected categories (general, reasoning, coding, search).
  3. Connect AI Models :

    • Link each model node (Opus 4, Gemini Thinking Pro, GPT 4.1 mini, Perplexity) to the "Model Selector" node.
    • Ensure credentials (API keys) for each model are correctly configured in their respective nodes.
  4. Set Up Memory :

    • Configure the "Simple Memory" node to use the sessionId from the input for context retention.
  5. Test Workflow :

    • Send test inputs to verify classification and model routing.
    • Check intermediate outputs (e.g., request_type) to ensure correct model selection.
  6. Activate Workflow :

    • Toggle the workflow to "Active" in n8n after testing.

Need help customizing?

Contact me for consulting and support or add me on Linkedin.