n8nflow.net logo

🌐 Firecrawl Website Content Extractor

by Aashit Sharmaβ€’Updated: Last update 4 months agoβ€’Source: n8n.io
Loading workflow viewer...

🌐 Firecrawl Website Content Extractor (n8n Workflow)

This n8n automation workflow uses Firecrawl API to extract structured data (e.g., quotes and authors) from web pages β€” such as Quotes to Scrape β€” and handles retries in case of delayed extraction.


πŸ” Workflow Overview

🎯 Purpose:

  • Crawl and extract structured web data using Firecrawl
  • Wait for asynchronous scraping to complete
  • Retrieve and validate results
  • Support retries if content is not ready

πŸ”§ Step-by-Step Node Breakdown

1. πŸ§ͺ Manual Trigger

  • Node: When clicking β€˜Test workflow’
  • Used to manually test or execute the workflow during setup or debugging.

2. πŸ“€ Firecrawl Extract API Request

  • Node: Extract
  • Sends a POST request to https://api.firecrawl.dev/v1/extract
  • Payload includes:
    • urls: List of pages to crawl (https://quotes.toscrape.com/*)
    • prompt: "Extract all quotes and their corresponding authors from the website."
    • schema: JSON schema defining expected structure (quotes[], each with text and author)

πŸ“Œ Uses an HTTP Header Auth credential for Firecrawl API


3. ⏱️ Wait for 30 Seconds

  • Node: 30 Secs
  • Gives Firecrawl time to finish processing in the background
  • Prevents hitting the API before results are ready

4. πŸ“₯ Get Results

  • Node: Get Results
  • Performs a GET request to the status URL using {{ $('Extract').item.json.id }} to retrieve extraction results.

5. βœ…βŒ Condition Check

  • Node: If
  • Checks if the data array is empty (i.e., no results yet)
  • If data is empty :
    • Waits 10 more seconds and retries
  • If data is available :
    • Passes data to the next step (e.g., processing or storage)

6. πŸ” Retry Delay

  • Node: 10 Seconds
  • Waits briefly before sending another GET request to Firecrawl

7. πŸ› οΈ Edit Fields (Optional Output Formatting)

  • Node: Edit Fields
  • Placeholder to structure or format the extracted results (quotes and authors)

🧾 Sticky Note: Firecrawl Setup Guide

Included as an embedded reference:

  • πŸ”— 10% Firecrawl Discount
  • 🧰 Instructions to:
    • Add Firecrawl API credentials in n8n
    • Use Firecrawl Community Node for self-hosted instances
    • Set up the schema and prompt for targeted data extraction

βœ… Key Features

  • πŸ”Œ API-based crawling with schema-structured output
  • ⏱️ Smart waiting + retry mechanism
  • 🧠 AI prompt integration for intelligent data parsing
  • βš™οΈ Flexible for different URLs, prompts, and schemas

πŸ“¦ Sample Output Schema

{
  "quotes": [
    {
      "text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
      "author": "Albert Einstein"
    },
    {
      "text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
      "author": "J.K. Rowling"
    }
  ]
}