Build an AI Product Monitoring Tool

Refreshing a product page and copying numbers into a spreadsheet works until you are tracking more than a handful of items. It is slow, easy to forget, and worse, it tells you what a price is right now but nothing about whether that number moved, by how much, or whether it matters. The interesting signal in product data is the change, not the snapshot, and you only see change if you collect the same fields on a schedule and compare them over time.

This guide builds a small, runnable ai product monitoring tool in Python. It scrapes public product pages on a timer with Crawlbase, stores each reading, diffs the new reading against the last one to catch meaningful moves in price, stock, and rating, and then hands those changes to an LLM that writes a short plain-English alert. Everything here stays on public product data: prices, availability, and ratings that any visitor sees without logging in. No accounts, no carts, no personal data.

What the tool does, end to end

Think of the system as a relay with four stations. First, Crawlbase fetches a product page and returns clean structured fields so you are not parsing brittle HTML by hand. Second, each reading gets written to a local store with a timestamp, which gives you the history that change detection depends on. Third, a diff step compares the latest reading to the previous one and decides whether anything moved enough to care about. Fourth, when there is a real change, an LLM turns the raw before/after numbers into a one-line summary you can drop into Slack, email, or a log.

The scheduling loop wraps all four so the whole thing runs on its own, once an hour or once a day, without you watching it. That loop is the part that turns a one-off scrape into actual monitoring. The same shape underlies most ecommerce web scraping work: collect, store, compare, act.

Prerequisites

You do not need to be an expert, but a little groundwork helps. You should be comfortable reading and editing a Python script, sending an HTTP request and checking the JSON that comes back, and running a file from your terminal. A rough sense of how an LLM responds to a structured prompt is useful for the summary step, though the code handles the wiring for you.

On the tooling side you need three things: Python 3.9 or newer installed locally, a Crawlbase account with an API token, and an API key for whichever LLM you use for the summary step (the example uses an OpenAI-compatible endpoint, which most providers expose). New Crawlbase accounts get up to 20,000 free requests: 1,000 on signup, and more as you complete onboarding steps, which is plenty to build and test this against a few real products.

Set up the project

Create a folder, a virtual environment, and install the two libraries the tool leans on: the Crawlbase client for scraping and the openai client for the summary step (it talks to any OpenAI-compatible API).

bash

python --version

mkdir product-monitor && cd product-monitor
python -m venv .venv
source .venv/bin/activate
pip install crawlbase openai

Keep both keys out of the code. Read them from environment variables so nothing secret lands in a commit. Set them once in your shell before running anything.

bash

export CRAWLBASE_TOKEN="your_crawlbase_token"
export LLM_API_KEY="your_llm_api_key"

Step 1: Collect product data with Crawlbase

The first station fetches a product page and returns the fields we care about. The cleanest path for a supported store is the Crawling API, which runs a maintained parser server-side and hands you structured JSON instead of raw HTML. You call the same endpoint as the Crawling API and add a scraper parameter naming the parser you want. Save this as collect.py.

python

import os
from crawlbase import ScraperAPI

scraper = ScraperAPI({"token": os.environ["CRAWLBASE_TOKEN"]})

def collect_product(url):
    # 'amazon-product-details' is one of the maintained parsers.
    response = scraper.get(url, {"scraper": "amazon-product-details"})
    body = response["json"]["body"]

    reading = {
        "url": url,
        "name": body.get("name"),
        "price": body.get("rawPrice"),
        "currency": body.get("currency"),
        "in_stock": body.get("inStock"),
        "rating": body.get("rating"),
    }

    if reading["price"] is None or reading["name"] is None:
        raise ValueError(f"Parse returned no price/name for {url}")

    return reading

If the store you are tracking is not one of the supported parsers, drop down to the Crawling API and parse the HTML yourself, or generate a target-specific extractor. Either way Crawlbase handles the hard part of the request: it rotates IPs, manages headers, and renders JavaScript when a page needs it, so you get a real response instead of a block page.

Normal vs JS rendering

The Scraper API and Crawling API default to a fast static fetch. If a product page renders its price or stock client-side (common on modern storefronts), pass "ajax_wait": "true" and a "page_wait" in milliseconds so the content loads before the HTML returns. Start at 5000 ms and raise it if a field comes back empty.

Step 2: Store each reading with a timestamp

Change detection needs memory, so every reading goes to disk with the time it was taken. A single SQLite file is enough and keeps the whole tool dependency-light. Save this as store.py.

python

import sqlite3
from datetime import datetime, timezone

DB = "readings.db"

def init_db():
    con = sqlite3.connect(DB)
    con.execute(
        """CREATE TABLE IF NOT EXISTS readings (
            url TEXT, name TEXT, price REAL, currency TEXT,
            in_stock INTEGER, rating REAL, taken_at TEXT)"""
    )
    con.commit()
    con.close()

def save_reading(r):
    con = sqlite3.connect(DB)
    con.execute(
        "INSERT INTO readings VALUES (?, ?, ?, ?, ?, ?, ?)",
        (r["url"], r["name"], r["price"], r["currency"],
         int(bool(r["in_stock"])), r["rating"],
         datetime.now(timezone.utc).isoformat()),
    )
    con.commit()
    con.close()

def last_two(url):
    con = sqlite3.connect(DB)
    con.row_factory = sqlite3.Row
    rows = con.execute(
        "SELECT * FROM readings WHERE url = ? ORDER BY taken_at DESC LIMIT 2",
        (url,),
    ).fetchall()
    con.close()
    return [dict(row) for row in rows]

last_two returns the newest reading and the one before it, which is all the diff step needs. If you want a full price history later for charting, the table already holds every row; just select all of them for a URL ordered by taken_at.

Step 3: Detect meaningful changes

This is where most naive monitors go wrong: they alert on every tiny wobble, so you tune them out within a day. The fix is a threshold. Only treat a price move as a change if it crosses a percentage you set, and always flag the things that are binary, like a product going out of stock. Save this as detect.py.

python

PRICE_THRESHOLD = 0.03  # 3% move counts as meaningful

def detect_changes(current, previous):
    changes = []

    old_price, new_price = previous["price"], current["price"]
    if old_price and new_price:
        delta = (new_price - old_price) / old_price
        if abs(delta) >= PRICE_THRESHOLD:
            changes.append({
                "field": "price",
                "old": old_price,
                "new": new_price,
                "pct": round(delta * 100, 1),
            })

    if previous["in_stock"] != current["in_stock"]:
        changes.append({
            "field": "in_stock",
            "old": bool(previous["in_stock"]),
            "new": bool(current["in_stock"]),
        })

    if previous["rating"] and current["rating"]:
        if abs(current["rating"] - previous["rating"]) >= 0.2:
            changes.append({
                "field": "rating",
                "old": previous["rating"],
                "new": current["rating"],
            })

    return changes

Tune the thresholds to the product. A commodity that swings a few cents all day wants a wider price band; a high-ticket item where a 3% drop is real money wants a tighter one. The point is that the rule lives in plain code you can read and adjust, not buried in a model's judgment.

Crawlbase Crawling API

The monitor is only as reliable as the data feeding it. The Scraper API returns clean structured product fields from supported stores, with IP rotation, header management, and JavaScript rendering handled server-side, so a scheduled job keeps returning real readings instead of block pages. Point it at a public product page on the free tier and build the loop around it.

Start free

Step 4: Summarize and alert with an LLM

A list of change dicts is correct but not readable at a glance. The LLM's job here is narrow and that is deliberate: turn the structured changes into one short, accurate sentence. Keeping the model on a tight, structured input is what keeps it from drifting or inventing detail. Save this as alert.py.

python

import os
import json
from openai import OpenAI

client = OpenAI(api_key=os.environ["LLM_API_KEY"])

def summarize_changes(product_name, changes):
    prompt = (
        f"Product: {product_name}\n"
        f"Detected changes (JSON): {json.dumps(changes)}\n\n"
        "Write one short sentence summarizing what changed. "
        "State only what the data shows. Do not speculate or "
        "add numbers that are not present."
    )

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system",
             "content": "You summarize product data changes factually."},
            {"role": "user", "content": prompt},
        ],
        temperature=0.1,
    )
    return response.choices[0].message.content.strip()

def send_alert(message):
    # Swap this for Slack, email, or a webhook in production.
    print(f"[ALERT] {message}")

Low temperature and an instruction to state only what the data shows keep the summary tied to the numbers you passed in. If you prefer not to depend on a hosted model, route the same call through a self-hosted or alternative OpenAI-compatible endpoint by changing the client's base URL; the rest of the function is unchanged. To deliver the alert for real, replace the print in send_alert with a Slack webhook post or an email send.

Step 5: Wire the loop together

Now connect the four stations into one pass over your watchlist. Each run collects a fresh reading, saves it, compares it to the previous reading, and alerts only when the diff finds something. Save this as monitor.py.

python

from collect import collect_product
from store import init_db, save_reading, last_two
from detect import detect_changes
from alert import summarize_changes, send_alert

WATCHLIST = [
    "https://www.example-store.com/product/abc",
    "https://www.example-store.com/product/xyz",
]

def run_once():
    init_db()
    for url in WATCHLIST:
        try:
            reading = collect_product(url)
        except Exception as exc:
            print(f"Skipped {url}: {exc}")
            continue

        save_reading(reading)
        history = last_two(url)
        if len(history) < 2:
            continue  # first reading, nothing to compare

        current, previous = history[0], history[1]
        changes = detect_changes(current, previous)
        if changes:
            summary = summarize_changes(reading["name"], changes)
            send_alert(summary)

if __name__ == "__main__":
    run_once()

Run it twice with a gap between (or seed the table with two readings) and you will see the alert fire when something moved. A single pass against a stable product prints nothing, which is exactly what you want: silence unless there is news.

bash

python monitor.py
# [ALERT] The price of "Acme Widget" dropped 7.4% from $129 to $119, and it is back in stock.

Step 6: Schedule the run

Monitoring means running on a timer without you typing the command. Do not loop forever inside the Python process and call sleep; that dies the moment the machine reboots and gives you no logs. Hand the schedule to your operating system instead. On Linux or macOS, a cron entry that runs the script every hour looks like this.

bash

# crontab -e, then add (runs at the top of every hour):
0 * * * * cd /path/to/product-monitor && .venv/bin/python monitor.py >> monitor.log 2>&1

On Windows, Task Scheduler does the same job: point a basic task at the Python executable inside your virtual environment with monitor.py as the argument and set the trigger to your chosen interval. Either way, pick a cadence that matches how fast the data actually moves. Hourly is fine for fast-moving prices; once a day is plenty for stock and ratings, and it is gentler on your request budget.

As your watchlist grows past a handful of products, a synchronous loop that fetches one URL at a time starts to drag. At that point move collection onto the asynchronous Crawler, which pushes results to a webhook as pages finish so you are not blocking on each request. For the broader strategy behind tracking competitor prices over time, web scraping for price intelligence covers the why; large-scale ecommerce scraping covers how the same pipeline holds up at volume.

Keeping the data flowing

A scheduled scraper is a scraper that has to keep working untended, so the data layer is where reliability matters most. Crawlbase already rotates IPs and manages headers behind every request, which is what keeps a repeating job from being flagged as a bot. If you need finer control over routing, or you want to send your own HTTP client through a rotating pool, the Smart AI Proxy exposes the same network as a standard proxy endpoint. Watch your run logs for status codes that drift from success: a sudden run of challenges or errors is the signal to slow your cadence or widen rotation, not to retry harder.

Recap

Key takeaways

Monitor the change, not the snapshot. The value is in comparing readings over time, which means you must store every reading with a timestamp.
Crawlbase is the reliable data layer. The Crawling API returns clean structured fields, and IP rotation plus rendering keep a scheduled job from getting blocked.
Use a threshold to detect meaningful moves. A percentage band on price plus binary checks on stock keep alerts signal, not noise.
Keep the LLM narrow. Feed it the structured diff, ask for one factual sentence at low temperature, and it summarizes instead of inventing.
Let the OS schedule it. Cron or Task Scheduler beats an in-process sleep loop; pick a cadence that matches how fast the data moves.
Stay on public data. Prices, stock, and ratings only; no accounts, carts, or personal data.

Frequently Asked Questions (FAQs)

What is an AI product monitoring tool?

It is a program that watches public product pages on a schedule, records key fields like price, stock, and rating each time, and uses an AI model to flag and explain meaningful changes. The scraping layer keeps the data flowing reliably, and the AI layer turns raw before/after numbers into a short, readable alert so you act on what moved instead of reading spreadsheets.

Do I need the Crawling API or the Scraper API for this?

Use the Scraper API when the store you track is one of the maintained parsers, because it returns structured product fields directly and saves you from writing extraction code. Use the Crawling API when you need the raw HTML to parse yourself or the page is not covered by a parser. Both share the same network, so IP rotation and rendering work either way; the difference is only whether Crawlbase parses the page for you.

How does the tool decide what counts as a meaningful change?

By thresholds you set in plain code, not by the model's judgment. The example treats a price move as meaningful only when it crosses a percentage band (3% by default), always flags a product going in or out of stock, and flags a rating shift of 0.2 or more. Tightening or loosening those numbers per product is how you keep alerts useful instead of constant.

Will the AI hallucinate numbers when it writes the alert?

The risk is real with open-ended prompts, which is why the summary step is kept narrow. The model receives only the structured diff, runs at a low temperature, and is told to state just what the data shows and not add numbers that are not present. That structure is what keeps the sentence tied to your actual readings rather than to invented detail.

What happens when a store changes its page layout?

If you are on the Crawling API, the maintained parser absorbs most layout shifts for you, which is one reason it is worth using where available. If you parse HTML yourself through the Crawling API, a layout change can break your selectors, and the fix is to re-inspect the live page and update them. Because Crawlbase always returns the full page, you adjust parsing logic rather than rebuilding the request.

How often should the monitor run?

Match the cadence to how fast the data moves and to your request budget. Hourly suits fast-changing prices; once a day is enough for stock and ratings and uses far fewer requests. Scheduling through cron or Windows Task Scheduler lets you set the interval per job, and you can run different products at different rates if some matter more than others.

Ian Kalvin

Technical Support Engineer · Crawlbase

Technical support engineer at Crawlbase, writing from the front line of what actually breaks in production scraping and proxy setups.

Neil Zamora

Senior Architect · Crawlbase

Senior architect at Crawlbase, focused on the systems behind large-scale crawling: proxy rotation, anti-bot resilience, and the APIs that hide that complexity.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. Up to 20,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What the tool does, end to end

Prerequisites

Set up the project

Step 1: Collect product data with Crawlbase

Step 2: Store each reading with a timestamp

Step 3: Detect meaningful changes

Step 4: Summarize and alert with an LLM

Step 5: Wire the loop together

Step 6: Schedule the run

Keeping the data flowing

Key takeaways

Frequently Asked Questions (FAQs)

What is an AI product monitoring tool?

Do I need the Crawling API or the Scraper API for this?

How does the tool decide what counts as a meaningful change?

Will the AI hallucinate numbers when it writes the alert?

What happens when a store changes its page layout?

How often should the monitor run?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

Building an LLM-Ready Stack Exchange Corpus: 33 Million Threads with the Crawling API

Turn Codex into a Full-Stack Web Scraper: Live Web Access with Web MCP

Build an AI Research Dataset with Web MCP: Crawl Once, Reuse Forever

The infrastructure brief, in your inbox.

We use cookies

Customize cookies