This is an exaple of advanced automated data extraction and enrichment pipeline with ScrapeGraphAI. Its primary purpose is to systematically scrape the n8n community workflows website, extract detailed information about recently added workflows, process that data using multiple AI models, and store the structured results in a Google Sheets spreadsheet.
This workflow demonstrates a sophisticated use of n8n to move beyond simple API calls and into the realm of intelligent, AI-driven web scraping and data processing, turning unstructured website content into valuable, structured business intelligence.
✅ Full Automation: Once triggered (manually or on a schedule via the Schedule Trigger node), the entire process runs hands-free, from data collection to spreadsheet population.
✅ Powerful AI-Augmented Scraping: It doesn't just scrape raw HTML. It uses multiple AI agents (Google Gemini, OpenAI) to:
✅ Robust and Structured Data Output: The use of the Structured Output Parser and Information Extractor nodes ensures the data is clean, consistent, and ready for analysis. It outputs perfectly formatted JSON that maps directly to spreadsheet columns.
✅ Scalability via Batching: The Split In Batches and Loop Over Items nodes allow the workflow to process a dynamically sized list of workflows. Whether there are 5 or 50 new workflows, it will process each one sequentially without failing.
✅ Effective Data Integration: It seamlessly integrates with Google Sheets , acting as a simple and powerful database. This makes the collected data immediately accessible, shareable, and available for visualization in tools like Looker Studio.
✅ Resilience to Website Changes: By using AI models trained to understand content and context (like "find the 'Recently Added' section" or "find the author's name"), the workflow is more resilient to minor cosmetic changes on the target website compared to traditional CSS/XPath selectors.
The workflow operates in two main phases:
Phase 1: Scraping the Main List
https://n8n.io/workflows/
page into clean Markdown format.workflows
.Phase 2: Processing Individual Workflows
To run this workflow, you need to configure the following credentials in your n8n instance:
ScrapegraphAI account
. Install the related Community node.Google Gemini(PaLM) (Eure)
.OpenAi account (Eure)
.Google Sheets account
. You must also ensure the node is configured with the correct Google Sheet ID and that the sheet has a worksheet named Foglio1
(or update the node to match your sheet's name).Contact me for consulting and support or add me on Linkedin.