n8nflow.net logo

Auto-Update Knowledge Base with Google Drive, LlamaIndex & Azure OpenAI Embeddings

by Khairul MuhtadinUpdated: Last update 4 days agoSource: n8n.io
Loading workflow viewer...

Getting Started

This Workflow auto-ingests Google Drive documents, parses them with LlamaIndex, and stores Azure OpenAI embeddings in an in-memory vector store—cutting manual update time from ~30 minutes to under 2 minutes per doc.

Why Use This Workflow?

Cost Reduction: Eliminates pays monthly fee on cloud just for store knowledge

Ideal For

  • Knowledge Managers / Documentation Teams: Automatically keep product docs and SOPs in sync when source files change on Google Drive.
  • Support Teams: Ensure the searchable KB is always up-to-date after doc edits, speeding agent onboarding and resolution time.
  • Developer / AI Teams: Populate an in-memory vector store for experiments, rapid prototyping, or local RAG demos.

How It Works

  1. Trigger: Google Drive Trigger watches a specific document or folder for updates.
  2. Data Collection: The updated file is downloaded from Google Drive.
  3. Processing: The file is uploaded to LlamaIndex cloud via an HTTP Request to create a parsing job.
  4. Intelligence Layer: Workflow polls LlamaIndex job status (Wait + Monitor loop). If parsing status equals SUCCESS, the result is retrieved as markdown.
  5. Output & Delivery: Parsed markdown is loaded into LangChain's Default Data Loader, passed to Azure OpenAI embeddings (deployment "3small"), then inserted into an in-memory vector store.
  6. Storage & Logging: Vector store holds embeddings in memory (good for prototyping). Optionally persist to an external vector DB for production.

Setup Guide

Prerequisites

RequirementTypePurpose
n8n instanceEssentialExecute and import the workflow — use the n8n instance
Google Drive OAuth2EssentialWatch and download documents from Google Drive
LlamaIndex Cloud APIEssentialParse and convert documents to structured markdown
Azure OpenAI AccountEssentialGenerate embeddings (deployment configured to model name "3small")
Persistent Vector DB (e.g., Pinecone)OptionalPersist embeddings for production-scale search

Installation Steps

  1. Import the workflow JSON into your n8n instance: open your n8n instance and import the file.
  2. Configure credentials:
    • Azure OpenAI: Provide Endpoint, API Key and set deployment name.
    • LlamaIndex API: Create an HTTP Header Auth credential in n8n. Header Name: Authorization. Header Value: Bearer YOUR_API_KEY.
    • Google Drive OAuth2: Create OAuth 2.0 credentials in Google Cloud Console, enable Drive API, and configure the Google Drive OAuth2 credential in n8n.
  3. Update environment-specific values:
    • Replace the workflow's Google Drive fileId with the GUID or folder ID you want to watch (do not commit public IDs).
  4. Customize settings:
    • Polling interval (Wait node): adjust for faster or slower job status checks.
    • Target file or folder: toggled on the Google Drive Trigger node.
    • Embedding model: change Azure OpenAI deployment if needed.
  5. Test execution:
    • Save changes and trigger a sample file update on Drive. Verify each node runs and the vector store receives embeddings.

Technical Details

Core Nodes

NodePurposeKey Configuration
Knowledge Base Updated Trigger (Google Drive Trigger)Triggers on file/folder changesSet trigger type to specific file or folder; configure OAuth2 credential
Download Knowledge Document (Google Drive)Downloads file binaryOperation: download; ensure OAuth2 credential is selected
Parse Document via LlamaIndex (HTTP Request)Uploads file to LlamaIndex parsing endpointPOST multipart/form-data to /parsing/upload; use HTTP Header Auth credential
Monitor Document Processing (HTTP Request)Polls parsing job statusGET /parsing/job/{{jobId}}; check status field
Check Parsing Completion (If)Branches on job statusCondition: {{$json.status}} equals SUCCESS
Retrieve Parsed Content (HTTP Request)Fetches parsed markdown resultGET /parsing/job/{{jobId}}/result/markdown
Default Data Loader (LangChain)Loads parsed markdown into document formatUse as document source for embeddings
Embeddings Azure OpenAIGenerates embeddings for documentsCredentials: Azure OpenAI; Model/Deployment: 3small
Insert Data to Store (vectorStoreInMemory)Stores documents + embeddingsUse memory store for prototyping; switch to DB for persistence

Workflow Logic

  • On Drive change, the file binary is downloaded and sent to LlamaIndex.
  • Workflow enters a monitor loop: Monitor Document Processing fetches job status, If node checks status. If not SUCCESS, Wait node delays before re-check.
  • When parsing completes, the workflow retrieves markdown, loads documents, creates embeddings via Azure OpenAI, and inserts data into an in-memory vector store.

Customization Options

Basic Adjustments:

  • Poll Delay: Set Wait node (default: every minute) to balance speed vs. API quota.
  • Target Scope: Switch the trigger from a single file to a folder to auto-handle many docs.
  • Embedding Model: Swap Azure deployment for a different model name as needed.

Advanced Enhancements:

  • Persistent Vector DB Integration: Replace vectorStoreInMemory with Pinecone or Milvus for production search.
  • Notification: Add Slack or email nodes to notify when parsing completes or fails.
  • Summarization: Add an LLM summarization step to generate chunk-level summaries.

Scaling option:

  • Batch uploads and chunking to reduce embedding calls; use a queue (Redis or n8n queue patterns) and horizontal workers for high throughput.

Performance & Optimization

MetricExpected PerformanceOptimization Tips
Execution time (per doc)~10s–2min (depends on file size & LlamaIndex processing)Chunk large docs; run embeddings in batches
API calls (per doc)3–8 (upload, poll(s), retrieve, embedding calls)Increase poll interval; consolidate requests
Error handlingRetries via Wait loop and If checksAdd exponential backoff, failure notifications, and retry limits

Troubleshooting

ProblemCauseSolution
Authentication errorsInvalid/missing credentialsReconfigure n8n Credentials; do not paste API keys directly into nodes
File not foundIncorrect fileId or permissionsVerify Drive fileId and OAuth scopes; share file with the service account if needed
Parsing stuck in PENDINGLlamaIndex processing delay or rate limitIncrease Wait node interval, monitor LlamaIndex dashboard, add retry limits
Embedding failuresModel/deployment mismatch or quota limitsConfirm Azure deployment name (3small) and subscription quotas

Created by: khmuhtadin
Category: Knowledge Management
Tags: google-drive, llamaindex, azure-openai, embeddings, knowledge-base, vector-store

Need custom workflows? Contact us