Smart Document Parser for Invoices, Logs or Sensor Reports (PDF/Image to Google Sheets)
This n8n workflow automatically parses documents such as invoices, sensor logs or structured PDFs/images (including scanned docs or CSVs), extracts key fields like totals, dates and customer/vendor info using OCR and AI, and writes the structured output into Google Sheets.
Who’s it for
- Finance or Ops teams automating invoice processing.
- SaaS platforms parsing uploaded reports or documents.
- Anyone needing a no-code backend for PDF/image/CSV document parsing.
- AI-powered data capture pipelines.
How it works
- Webhook Trigger receives file uploads (
/uploadDoc)
- Switch Node checks the file type:
- If image → Use Tesseract OCR
- If PDF → Use PDF parser
- If CSV → Extract as-is
- Extracted text is passed to:
- Google Gemini or Gemini Flash AI model
- Prompt extracts fields like
invoice_id, total, customer_name, etc.
- JSON string is parsed and cleaned
- Data is appended to Google Sheets using
appendOrUpdate
How to set up
- Create a Google Sheet with columns like:
invoice_id, invoice_date, due_date, customer_name, vendor_name, subtotal, tax_total, total, currency
- Connect:
- Google Sheets OAuth
- Google Gemini (PaLM API key) for LLM parsing
- Deploy the webhook endpoint:
/uploadDoc
- Upload sample files (PDFs, images, CSVs) to test
- Review and map sheet columns in the
Invoice Data node
Requirements
| Tool | Purpose |
|---|
| n8n | Automation framework |
| Google Sheets | To store structured output |
| Tesseract OCR | For scanned image text extraction |
| Google Gemini | For natural language parsing |
How to customize
- Add extraction for line items using structured prompts.
- Change prompt to extract sensor readings , log types , or custom keys.
- Add support for other file types (e.g., XLSX, DOCX).
- Add Slack/Email notifications on success/failure.
- Swap Gemini with OpenAI or Hugging Face if preferred.
Add‑ons
- Save uploaded files to Google Drive or S3
- Add auth for secure uploads
- Use charting/dashboard nodes to visualize extracted data
- Integrate with billing/accounting software
Use Case Examples
| Scenario | What Happens |
|---|
| Invoice Upload (PDF) | Extracts totals, customer, tax data into a Google Sheet |
| Scanned Receipt (Image) | OCR + LLM extracts structured data |
| Log File (CSV) | Parses and logs entries into Sheets |
Common troubleshooting
| Issue | Possible Cause | Solution |
|---|
Webhook not triggered | URL or method mismatch | Use correct POST URL /uploadDoc |
Text is blank | OCR failed | Check image quality or Tesseract config |
Gemini model not returning JSON | Prompt formatting issue | Ensure prompt ends with valid JSON schema |
Sheet not updated | Invalid Sheet ID or tab | Double-check sheet credentials and tab name |
Need Help?
- Need help fine-tuning the Gemini prompt for better field accuracy?
- Want to extract full tables, multi-page invoices or convert PDFs to JSON lines?
Our automation team at WeblineIndia can help you extend this into a full-blown document automation pipeline.