This workflow enables multimodal file analysis using Google Gemini tools connected to a text-only LLM agent. Users can upload images, videos, audio files, or documents via a chat interface. The workflow will:
Unlike end-to-end multimodal LLMs (like Gemini 1.5 or GPT-4o), this template:
| Feature | Benefit | 
|---|---|
| 🧩 Modular | LLM + Tools are decoupled; can update them independently | 
| 💸 Cost-Efficient | No need to pay for full multimodal models; only use tools when needed | 
| 🔧 Tool-based Reasoning | Agent invokes tools on demand, just like OpenAI’s Toolformer setup | 
| ⚡ Fast | Groq LLMs offer ultra-fast responses with low latency | 
| 📚 Memory | Includes context buffer for multi-turn chats (15 messages) | 
chatTrigger.If no files: prompt is passed directly to the agent.
If files are included:
A new chatInput is dynamically generated:
User message
Media: [array of file data]
The Langchain Agent receives:
The enriched prompt
File URLs
Memory context (15 turns)
Access to 4 Gemini tools:
IMG: analyze imageVIDEO: analyze videoAUDIO: analyze audioDOCUMENT: analyze documentThe agent autonomously decides whether and how to use tools, then responds with concise output.
| Category | Node / Tool | Purpose | 
|---|---|---|
| Chat Input | chatTrigger | User interface with file support | 
| File Processing | splitOut,splitInBatches | Process each uploaded file | 
| Upload | googleGemini | Uploads each file to Gemini, gets URL | 
| Metadata | set,aggregate | Builds structured file info | 
| AI Agent | Langchain Agent | Receives context + file data | 
| Tools | googleGeminiTool | Analyze media with Gemini | 
| LLM | lmChatGroq(Qwen 32B) | Text reasoning, high-speed | 
| Memory | memoryBufferWindow | Maintains session context | 
Replace existing credentials on:
Upload a fileGeminiTool (IMG, VIDEO, AUDIO, DOCUMENT)lmChatGroq"Hola, ¿qué dice este PDF?"
Uploads a document → Agent routes it to Gemini DOCUMENT tool → Receives extracted content → LLM summarizes it in Spanish.
multimodal, agent, langchain, groq, gemini, image analysis, audio analysis, document parsing, video analysis, file uploader, chat assistant, LLM tools, memory, AI tools