{ "id": "", "meta": { "instanceId": "", "templateCredsSetupCompleted": true }, "name": "Easily Compare LLMs Using OpenAI and Google Sheets", "tags": [], "nodes": [ { "id": "", "name": "When chat message received", "type": "@n8n/n8n-nodes-langchain.chatTrigger", "position": [ -7400, 3040 ], "webhookId": "", "parameters": { "options": {} }, "typeVersion": 1.1 }, { "id": "", "name": "Loop Over Items", "type": "n8n-nodes-base.splitInBatches", "position": [ -5960, 3040 ], "parameters": { "options": { "reset": false } }, "typeVersion": 3 }, { "id": "", "name": "Simple Memory", "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow", "position": [ -4880, 3000 ], "parameters": { "sessionKey": "={{$('Set model, sessionId, chatInput, sessionIdBase').item.json.sessionId}}", "sessionIdType": "customKey" }, "typeVersion": 1.3 }, { "id": "", "name": "Chat Memory Manager", "type": "@n8n/n8n-nodes-langchain.memoryManager", "position": [ -4980, 3180 ], "parameters": { "options": {} }, "typeVersion": 1.1 }, { "id": "", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [ -8120, 2600 ], "parameters": { "color": 5, "width": 640, "height": 1180, "content": "## Easily Compare LLMs Using OpenAI and Google Sheets\n\nThis workflow allows you to **easily evaluate and compare the outputs of two language models (LLMs)** before choosing one for production.\n\nIn the chat interface, both model outputs are shown side by side. Their responses are also logged into a Google Sheet, where they can be evaluated manually or automatically using a more advanced model.\n\n### Use Case\nYou're developing an AI agent, and since LLMs are non-deterministic, you want to determine which one performs best for your specific use case. This template is designed to help you compare them effectively.\n\n### How It Works\n- The user sends a message to the chat interface.\n- The input is duplicated and sent to two different LLMs.\n- Each model processes the same prompt independently, using its own memory context.\n- Their answers, along with the user input and previous context, are logged to Google Sheets.\n- You can review, compare, and evaluate the model outputs manually (or automate it later).\n- In the chat, both responses are also shown one after the other for direct comparison.\n\n### How To Use It\n- Copy this [Google Sheets template](https://docs.google.com/spreadsheets/d/1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4/) (File > Make a Copy).\n- Set up your **System Prompt** and **Tools** in the **AI Agent** node to suit your use case.\n- Start chatting! Each message will trigger both models and log their responses to the spreadsheet.\n\n\n*Note: This version is set up for two models. If you want to compare more, you’ll need to extend the workflow logic and update the sheet.*\n\n### About Models\nYou can use **OpenRouter** or **Vertex AI** to test models across providers. \nIf you're using a node for a specific provider, like OpenAI, you can compare different models from that provider (e.g., `gpt-4.1` vs `gpt-4.1-mini`).\n\n### Evaluation in Google Sheets\nThis is ideal for teams, allowing non-technical stakeholders (not just data scientists) to evaluate responses based on real-world needs.\n\nAdvanced users can automate this evaluation using a more capable model (like `o3` from **OpenAI**), but note that this will increase token usage and cost.\n\n### Token Considerations\nSince **each input is processed by two different models**, the workflow will consume more tokens overall. \nKeep an eye on usage, especially if working with longer prompts or running multiple evaluations, as this can impact cost.\n\n" }, "typeVersion": 1 }, { "id": "", "name": "OpenRouter Chat Model", "type": "@n8n/n8n-nodes-langchain.lmChatOpenRouter", "position": [ -5180, 3000 ], "parameters": { "model": "={{$json.model}}" }, "credentials": { "openRouterApi": { "id": "", "name": "" } }, "typeVersion": 1 }, { "id": "", "name": "Sticky Note1", "type": "n8n-nodes-base.stickyNote", "position": [ -7220, 2620 ], "parameters": { "color": 7, "width": 360, "height": 580, "content": "## Define Models to Compare\n\nThis node defines the array of model IDs to be compared.\n\nIn this template, we compare two models using the OpenRouter API. You can modify the list by specifying the full model IDs you want to test.\n\nExample:\n**[\"openai/gpt-4.1\", \"mistralai/mistral-large\"]**\n\nIf you're using a different LLM provider (like OpenAI directly, or Google Vertex AI), make sure to update the model IDs according to that provider's naming conventions.\n\n*Note: This template is built for two models. For more, you’ll need to adjust the workflow logic and the Google Sheet structure.*\n" }, "typeVersion": 1 }, { "id": "", "name": "Sticky Note2", "type": "n8n-nodes-base.stickyNote", "position": [ -6500, 2620 ], "parameters": { "color": 7, "width": 360, "height": 580, "content": "## Set model, sessionId, chatInput, sessionIdBase\n\nThis node prepares the variables used during the loop that queries each model.\n\n- **model**: The ID of the model being used in the current iteration.\n- **sessionId**: A unique session key combining the original session ID and model name. This ensures memory isolation per model.\n- **chatInput**: The user’s input message.\n- **sessionIdBase**: The original session ID without any model-specific suffix. Used in Sheets to group evaluations from the same session." }, "typeVersion": 1 }, { "id": "", "name": "Set model, sessionId, chatInput, sessionIdBase", "type": "n8n-nodes-base.set", "position": [ -6380, 3040 ], "parameters": { "options": {}, "assignments": { "assignments": [ { "id": "", "name": "model", "type": "string", "value": "={{ $json.models }}" }, { "id": "", "name": "sessionId", "type": "string", "value": "={{ $('When chat message received').item.json.sessionId }}{{$json.models }}" }, { "id": "", "name": "chatInput", "type": "string", "value": "={{ $('When chat message received').item.json.chatInput }}" }, { "id": "", "name": "sessionIdBase", "type": "string", "value": "={{ $('When chat message received').item.json.sessionId }}" } ] } }, "typeVersion": 3.4 }, { "id": "", "name": "AI Agent", "type": "@n8n/n8n-nodes-langchain.agent", "position": [ -5480, 3180 ], "parameters": { "options": { "returnIntermediateSteps": false } }, "typeVersion": 1.8 }, { "id": "", "name": "Sticky Note3", "type": "n8n-nodes-base.stickyNote", "position": [ -5600, 3160 ], "parameters": { "color": 7, "width": 540, "height": 520, "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## AI Agent\n\nThis AI Agent is connected to OpenRouter Models. The model is selected dynamically from the variable `{{$json.model}}`, defined earlier.\n\nMemory is isolated per model using the `{{$('Set model, sessionId, chatInput, sessionIdBase').item.json.sessionId}}` key.\n\n**⚠️ This agent currently has no system prompt or tools configured**. If you want to test specific tasks, you must define them yourself to reflect realistic use cases." }, "typeVersion": 1 }, { "id": "", "name": "Sticky Note4", "type": "n8n-nodes-base.stickyNote", "position": [ -5040, 3160 ], "parameters": { "color": 7, "width": 380, "height": 520, "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Chat Memory Manager\n\nThis node handles retrieval of prior context for the chat session. It helps with qualitative evaluation by storing context that’s injected into the Google Sheet.\n\nIt shares memory with the AI Agent via the “Simple Memory” node.\n\n> You can switch to Redis or Postgres memory backends if needed." }, "typeVersion": 1 }, { "id": "", "name": "Sticky Note5", "type": "n8n-nodes-base.stickyNote", "position": [ -4640, 3160 ], "parameters": { "color": 7, "width": 380, "height": 760, "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Prepare Data for Chat and Google Sheets\n\nThis node sets the following fields:\n\n- **output**: The model's response, formatted for chat display with visual separation to make comparison easier.\n- **chatInput**: The user input that will be recorded in Google Sheets.\n- **model_answer**: The actual answer from the model being evaluated.\n- **model**: The name or ID of the model providing the answer, used for identifying performance.\n- **context**: A history of the prior conversation (excluding the latest input). If it's the user's first message, a placeholder is used.\n- **sessionId**: A unique session identifier combining model name and session, ensuring separate context windows for each model.\n- **sessionIdBase**: The original user session ID (without model suffix), useful for grouping responses from different models in Sheets." }, "typeVersion": 1 }, { "id": "", "name": "Concatenate Chat Answers", "type": "n8n-nodes-base.summarize", "position": [ -5300, 2620 ], "parameters": { "options": {}, "fieldsToSummarize": { "values": [ { "field": "output", "separateBy": "\n", "aggregation": "concatenate" } ] } }, "typeVersion": 1.1 }, { "id": "", "name": "Sticky Note6", "type": "n8n-nodes-base.stickyNote", "position": [ -5080, 2120 ], "parameters": { "color": 5, "width": 460, "height": 500, "content": "## Add Model Results to Google Sheet\n\nThis Google Sheets step records both model responses along for evaluation.\n\n⚠️ Depending on the length of model responses, you may need to adjust row height or column width.\n\nThe template includes basic evaluation fields (`model_1_eval`, `model_2_eval`) with a dropdown like: \n**\"Good\", \"Correct\", \"Bad\"**, but feel free to customize with more granular rating criteria." }, "typeVersion": 1 }, { "id": "", "name": "Group Model Outputs for Evaluation", "type": "n8n-nodes-base.aggregate", "position": [ -5300, 2440 ], "parameters": { "options": {}, "fieldsToAggregate": { "fieldToAggregate": [ { "fieldToAggregate": "model_answer" }, { "fieldToAggregate": "context" }, { "fieldToAggregate": "chatInput" }, { "fieldToAggregate": "sessionIdBase" }, { "fieldToAggregate": "model" } ] } }, "typeVersion": 1 }, { "id": "", "name": "Add Model Results to Google Sheet", "type": "n8n-nodes-base.googleSheets", "onError": "continueRegularOutput", "position": [ -4940, 2440 ], "parameters": { "columns": { "value": { "sessionId": "={{ $json.sessionIdBase[0] }}", "model_1_id": "={{ $json.model[0] }}", "model_2_id": "={{ $json.model[1] }}", "user_input": "={{ $json.chatInput[0] }}", "model_1_answer": "={{ $json.model_answer[0] }}", "model_2_answer": "={{ $json.model_answer[1] }}", "context_model_1": "={{ $json.context[0] }}", "context_model_2": "={{ $json.context[1] }}" }, "schema": [ { "id": "sessionId", "type": "string", "display": true, "required": false, "displayName": "sessionId", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "model_1_id", "type": "string", "display": true, "required": false, "displayName": "model_1_id", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "model_2_id", "type": "string", "display": true, "required": false, "displayName": "model_2_id", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "user_input", "type": "string", "display": true, "required": false, "displayName": "user_input", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "model_1_answer", "type": "string", "display": true, "required": false, "displayName": "model_1_answer", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "model_2_answer", "type": "string", "display": true, "required": false, "displayName": "model_2_answer", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "model_1_eval", "type": "string", "display": true, "required": false, "displayName": "model_1_eval", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "model_2_eval", "type": "string", "display": true, "required": false, "displayName": "model_2_eval", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "context_model_1", "type": "string", "display": true, "required": false, "displayName": "context_model_1", "defaultMatch": false, "canBeUsedToMatch": true }, { "id": "context_model_2", "type": "string", "display": true, "required": false, "displayName": "context_model_2", "defaultMatch": false, "canBeUsedToMatch": true } ], "mappingMode": "defineBelow", "matchingColumns": [], "attemptToConvertTypes": false, "convertFieldsToString": false }, "options": {}, "operation": "append", "sheetName": { "__rl": true, "mode": "list", "value": "gid=0", "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4/", "cachedResultName": "llms_eval" }, "documentId": { "__rl": true, "mode": "list", "value": "1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4", "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1grO5jxm05kJ7if9wBIOozjkqW27i8tRedrheLRrpxf4/", "cachedResultName": "Template - Easy LLMs Eval" }, "authentication": "serviceAccount" }, "credentials": { "googleApi": { "id": "", "name": "" } }, "typeVersion": 4.5 }, { "id": "", "name": "Prepare Data for Chat and Google Sheets", "type": "n8n-nodes-base.set", "position": [ -4500, 3180 ], "parameters": { "options": {}, "assignments": { "assignments": [ { "id": "", "name": "output", "type": "string", "value": "=### `{{ $('Set model, sessionId, chatInput, sessionIdBase').item.json.model }}` answered :\n\n\n{{ $('AI Agent').item.json.output }}\n\n----------\n" }, { "id": "", "name": "chatInput", "type": "string", "value": "={{ $('Set model, sessionId, chatInput, sessionIdBase').item.json.chatInput }}" }, { "id": "", "name": "model_answer", "type": "string", "value": "={{ $('AI Agent').item.json.output }}" }, { "id": "", "name": "model", "type": "string", "value": "={{ $('Set model, sessionId, chatInput, sessionIdBase').item.json.model }}" }, { "id": "", "name": "context", "type": "string", "value": "={{\n (() => {\n const history = $json[\"messages\"]; // ou adapter selon ton chemin réel\n if (!Array.isArray(history) || history.length <= 1) {\n return \"No prior context available — likely the user's first message or memory not yet initialized.\";\n }\n\n const truncated = history.slice(0, -1); // on enlève le dernier échange\n return truncated.map(pair => `Human: ${pair.human}\\nAI: ${pair.ai}`).join('\\n');\n })()\n}}\n" }, { "id": "", "name": "sessionId", "type": "string", "value": "={{ $('Loop Over Items').item.json.sessionId }}" }, { "id": "", "name": "sessionIdBase", "type": "string", "value": "={{ $('Loop Over Items').item.json.sessionIdBase }}" } ] } }, "typeVersion": 3.4 }, { "id": "", "name": "Define Models to Compare", "type": "n8n-nodes-base.set", "position": [ -7100, 3040 ], "parameters": { "options": {}, "assignments": { "assignments": [ { "id": "", "name": "=models", "type": "array", "value": "=[\"openai/gpt-4.1\", \"mistralai/mistral-large\"]" } ] } }, "typeVersion": 3.4 }, { "id": "", "name": "Split Models into Items", "type": "n8n-nodes-base.splitOut", "position": [ -6760, 3040 ], "parameters": { "options": {}, "fieldToSplitOut": "models" }, "typeVersion": 1 }, { "id": "", "name": "Set Output for Chat UI", "type": "n8n-nodes-base.set", "position": [ -4940, 2620 ], "parameters": { "options": {}, "assignments": { "assignments": [ { "id": "", "name": "output", "type": "string", "value": "={{ $json.concatenated_output }}" } ] } }, "typeVersion": 3.4 } ], "active": false, "pinData": {}, "settings": { "executionOrder": "v1" }, "versionId": "", "connections": { "AI Agent": { "main": [ [ { "node": "Chat Memory Manager", "type": "main", "index": 0 } ] ] }, "Simple Memory": { "ai_memory": [ [ { "node": "Chat Memory Manager", "type": "ai_memory", "index": 0 }, { "node": "AI Agent", "type": "ai_memory", "index": 0 } ] ] }, "Loop Over Items": { "main": [ [ { "node": "Concatenate Chat Answers", "type": "main", "index": 0 }, { "node": "Group Model Outputs for Evaluation", "type": "main", "index": 0 } ], [ { "node": "AI Agent", "type": "main", "index": 0 } ] ] }, "Chat Memory Manager": { "main": [ [ { "node": "Prepare Data for Chat and Google Sheets", "type": "main", "index": 0 } ] ] }, "OpenRouter Chat Model": { "ai_languageModel": [ [ { "node": "AI Agent", "type": "ai_languageModel", "index": 0 } ] ] }, "Split Models into Items": { "main": [ [ { "node": "Set model, sessionId, chatInput, sessionIdBase", "type": "main", "index": 0 } ] ] }, "Concatenate Chat Answers": { "main": [ [ { "node": "Set Output for Chat UI", "type": "main", "index": 0 } ] ] }, "Define Models to Compare": { "main": [ [ { "node": "Split Models into Items", "type": "main", "index": 0 } ] ] }, "When chat message received": { "main": [ [ { "node": "Define Models to Compare", "type": "main", "index": 0 } ] ] }, "Group Model Outputs for Evaluation": { "main": [ [ { "node": "Add Model Results to Google Sheet", "type": "main", "index": 0 } ] ] }, "Prepare Data for Chat and Google Sheets": { "main": [ [ { "node": "Loop Over Items", "type": "main", "index": 0 } ] ] }, "Set model, sessionId, chatInput, sessionIdBase": { "main": [ [ { "node": "Loop Over Items", "type": "main", "index": 0 } ] ] } } }