Product research used to mean opening a dozen browser tabs, copying prices into a spreadsheet by hand, and refreshing it all again next week. Automating that work was something only developers could pull off, because it meant writing scrapers, managing proxies, and babysitting code. That barrier is gone. Visual automation tools now let anyone wire up a workflow that collects product data on its own, no programming background required.

This guide shows you how to automate eCommerce product research with n8n, an open-source automation platform, and Crawlbase, which does the hard part of fetching live product pages from sites like Amazon. You will build a workflow that reads a list of products from a Google Sheet, pulls fresh details for each one through Crawlbase, and writes the results back so your research stays current without you touching it. We will cover two ways to connect the two: the Web MCP for an AI-driven flow, and the Crawling API for a direct, predictable one.

Why automate product research at all

Manual research does not scale. The moment you are tracking more than a handful of products, the copy-paste loop eats your day and the data is stale by the time you finish. Worse, prices and availability move constantly, so a one-time snapshot tells you almost nothing about a market. The value is in watching the same set of products over time and spotting the changes.

An automated workflow turns that into a background process. You define the products once, set a schedule, and the data refreshes itself. This kind of setup pays off for several roles:

  • eCommerce researchers building large datasets for niche analysis
  • Competitor analysts tracking price and stock movement
  • Amazon sellers monitoring their own and rival listings
  • Data analysts studying market trends across weeks or months

The piece most people get stuck on is the fetching. Amazon and other large retailers render content with JavaScript and challenge automated traffic aggressively, so a plain HTTP request usually returns a near-empty page or a block. That is the job Crawlbase handles, and it is why this workflow stays reliable instead of breaking the first time a site tightens its defenses.

What you need before you start

You are connecting a few tools, not coding. Get these in place first:

  • n8n to build and run the automation. Use a local install or n8n Cloud; both work the same for this workflow.
  • A Crawlbase account to do the crawling. Sign up, then copy your API tokens from the dashboard. You get a Normal token for static pages and a JavaScript token for client-rendered ones.
  • A Google account with the Sheets API enabled so n8n can read your product list and write results back. You connect it the first time you add a Google Sheets node.
  • A Google Sheet with your product list a single column of Amazon ASINs is enough to start. Name the column something clear like ASIN.

The whole flow is short: your sheet holds the list of products, n8n runs the workflow on a trigger or a schedule, Crawlbase fetches each product page and returns clean structured data, and n8n writes that data back into the sheet.

Two ways to connect n8n to Crawlbase

There are two solid paths, and which one you pick depends on how much control versus flexibility you want.

The Web MCP route exposes crawling as tools an AI Agent in n8n can call on its own. You describe the goal in plain language, the agent decides which crawl tool to use, and you get structured output back. It shines when your targets vary or you want the model to reason over the page. The full Web MCP walkthrough covers that setup end to end.

The Crawling API route is a direct HTTP call. You send a URL and a token, you get back the data, every time, with no model in the loop. It is predictable, cheap, and the right default for a focused job like pulling the same fields off product pages. That is the path we build below, because product research is a well-defined task where a built-in scraper does exactly what you need.

Which token to use

Crawlbase issues two token types. The Normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. Amazon product pages load key details client-side, so reach for the JS token here. Using the Normal token on a client-rendered page returns a thin shell with the data missing.

Step 1: Create a new workflow in n8n

Log in to n8n and click Create Workflow. Give it a clear name like "Amazon Product Data Collector" so you can find it later when you have a few workflows running. You start with an empty canvas; everything from here is dragging nodes onto it and wiring them together.

Step 2: Read the product list from Google Sheets

Add a Google Sheets node and choose the On row added or updated trigger. This fires the workflow automatically whenever you add a new ASIN to your sheet, so research kicks off the moment you drop in something to look up.

Authenticate with your Google account, then select the spreadsheet and worksheet holding your ASIN list. Click Fetch Test Event to confirm the connection pulls a row. Once it does, n8n knows exactly where to read your products from, and each row that flows through carries an ASIN value the next step can use.

Step 3: Fetch product data with the Crawling API

n8n does not ship a dedicated Crawlbase node, so you use the built-in HTTP Request node to call the Crawling API directly. Add it after the Google Sheets node and configure the parameters:

  • Method: GET
  • URL: https://api.crawlbase.com
  • Authentication: None
  • Send Query Parameters: toggle on

Then add three query parameters. The url value is built dynamically from the ASIN that came through the trigger, so each row fetches its own product:

text
token     your_crawlbase_js_token
url       https://www.amazon.com/dp/{{ $json.ASIN }}/ref=nosim
scraper   amazon-product-details

Three values do the work. token is your JavaScript token from the dashboard. url uses the n8n expression {{ $json.ASIN }} to drop the current row's ASIN into the Amazon product URL, so the same node handles every product without edits. scraper tells Crawlbase to run its built-in amazon-product-details parser, which returns the page already structured as JSON instead of raw HTML you would have to parse yourself.

Click Execute Node and you should see a JSON response in the output panel with fields like name, price, rating, and reviewCount. The shape looks like this:

json
{
  "body": {
    "name": "Wireless Noise Cancelling Headphones",
    "price": "$248.00",
    "rating": "4.6",
    "reviewCount": 31482,
    "inStock": true
  }
}

Because the built-in scraper already parsed the page, you read these fields by name in the next node. No CSS selectors, no XPath, nothing that breaks when Amazon ships a redesign.

Crawlbase Crawling API

n8n drives the automation, but it cannot get past Amazon's anti-bot defenses on its own. The Crawling API closes that gap in one call: pass a JS token and it renders the page in a real browser, rotates through residential IPs server-side, and returns clean structured data, so you skip running a headless fleet and a proxy pool yourself. Start free with 1,000 requests and point it at a public product page first.

Step 4: Write the results back to Google Sheets

Now store what you collected. Open your sheet and add columns for the fields you care about, for example ASIN, Title, Price, Rating, and URL. You can always add more later as you decide what matters for your research.

Back in n8n, add a second Google Sheets node after the HTTP Request and choose the Append or update row in sheet action. Set Column to match on to ASIN so each product updates its own row instead of creating duplicates on every run. Then map the fields from the Crawling API response into your columns:

text
ASIN     {{ $('Google Sheets Trigger').item.json.ASIN }}
Title    {{ $json.body.name }}
Price    {{ $json.body.price }}
Rating   {{ $json.body.rating }}
URL      {{ $json.url }}

The ASIN comes from the original trigger so it always matches the right row, while the product fields come from the body of the Crawling API response. Run the workflow once and watch the sheet fill in with titles, prices, and ratings, all neatly aligned to their ASINs.

Step 5: Activate and schedule it

Flip the workflow to Active in the top-right corner. From now on, adding an ASIN to your sheet triggers a fresh crawl and writes the details back within seconds. That alone removes the manual lookup loop.

For ongoing research, you want the data to refresh on its own rather than only when you add rows. Add a Schedule Trigger node that runs the workflow daily or every few hours and feeds your existing ASIN list back through the Crawling API. Now your sheet tracks price and stock changes over time without you lifting a finger, which is the whole point of automating the research instead of just speeding it up.

Make the workflow more useful

Once the basics run, a few small additions turn a data collector into a real research tool. None of these require rebuilding anything.

Capture more fields

The built-in scraper returns more than title and price. Map in rating, reviewCount, seller name, or availability to get a fuller picture of how a product performs and how it shifts over time. Richer data is what makes trend analysis worth doing.

Swap in other scrapers

Crawlbase has built-in scrapers for many sites, not just Amazon. To research a different platform, change the scraper query parameter and the target URL; the rest of the workflow stays the same. The AI agent workflow approach goes a step further and lets an agent pick the right tool per target, which is handy when your sources vary.

Store crawled pages for later

If you want a backup of everything you fetch, add store=true to the Crawling API request and Crawlbase keeps a copy of each crawl in your account. That lets you revisit old snapshots or compare changes without re-crawling, which is useful for audits and historical analysis.

Route the data anywhere

Google Sheets is the simplest destination, but n8n connects to hundreds of services. Push alerts to Slack when a competitor drops a price, append rows to a database, or feed a dashboard. The crawl step does not change; you just add a node after it.

Staying unblocked at scale

The reason this workflow keeps running where a homegrown scraper would stall is that the Crawling API handles rendering and IP rotation for you. It runs each request behind a real browser and rotating residential IPs, so Amazon sees normal-looking traffic rather than an obvious bot. If you would rather route your own requests through a rotating pool, the Smart AI Proxy gives you the same residential rotation as a drop-in endpoint. Either way, the broader tactics live in how to scrape websites without getting blocked.

Recap

Key takeaways

  • Split the work. n8n drives the automation and Google Sheets holds the data; Crawlbase fetches the product pages and returns them structured. Each piece does the part it is good at.
  • Use the built-in scraper. The amazon-product-details scraper returns parsed JSON, so you read fields like name and price by name with no selectors to maintain.
  • Use the JS token for Amazon. Product details render client-side, so the JavaScript token renders the page in a real browser before returning data.
  • Schedule it. A Schedule Trigger refreshes your list automatically, turning one-time scraping into ongoing trend tracking.
  • Two connection paths. Use the Crawling API for a direct, predictable job, or the Web MCP when you want an AI Agent to choose tools across varied targets.

Frequently Asked Questions (FAQs)

Do I need to know how to code to build this?

No. The entire workflow is built visually in n8n by dragging nodes and filling in fields. You never write a script. The only technical-looking parts are the n8n expressions like {{ $json.ASIN }}, which are just placeholders that pull a value from the previous step, and you copy them straight from this guide.

Should I use the Web MCP or the Crawling API?

Use the Crawling API for a focused, repeatable job like pulling the same fields off product pages, because it is a direct call with predictable cost and output. Reach for the Web MCP when you want an AI Agent in n8n to decide which crawl tool to use across varied or changing targets. Both run on the same Crawlbase crawling engine, so you can start with one and switch later.

Why use Crawlbase instead of a plain HTTP request to Amazon?

Amazon renders product details with JavaScript and blocks automated traffic aggressively, so a plain request usually returns an empty shell or a block page. Crawlbase renders the page in a real browser behind rotating residential IPs, then returns clean structured data, which is the part that keeps the workflow from breaking the moment you scale up.

How do I research products on sites other than Amazon?

Change the scraper query parameter to the one that matches your target site and update the URL the workflow builds. Crawlbase has built-in scrapers for many platforms, and the rest of the n8n workflow, the trigger and the Google Sheets write-back, stays exactly the same.

Will this get me blocked?

The Crawling API rotates residential IPs and renders pages server-side, which handles most blocking for you. To stay safe at volume, pace your requests with the Schedule Trigger rather than hammering a site, vary your targets, and watch the response status codes so you can back off if a site starts challenging traffic.

Can I keep a history of the data I collect?

Yes. Add store=true to the Crawling API request and Crawlbase saves a copy of each crawl in your account, so you can revisit old snapshots or compare changes without fetching again. On the n8n side, scheduling regular runs against the same ASIN list builds a running record of how prices and stock move over time.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available