Plenty of the content you want from a page is not in the first response. Social feeds, product grids, search results, and review lists often load a handful of items, then fetch and render more only when you scroll toward the bottom. This pattern, infinite scroll, keeps the initial page light, but it also means a single HTTP request hands you a fraction of the data you can see in a browser.
This guide shows you how to scroll a website while crawling with JavaScript and Node.js. You build a small, runnable scraper that drives the scroll through the Crawling API using its scroll and scroll_interval options, lets the page load more items, then parses the loaded content with Cheerio and exports it. The walkthrough uses a neutral placeholder listing URL so you can swap in your own public target, and it stays scoped to public data only.
What you will build
A Node.js script that points at a public, infinite-scroll listing page, asks the Crawling API to render it and keep scrolling for a set number of seconds so more items load, and then extracts a structured record per item from the returned HTML. The running example pulls a generic public listing with these fields:
- Title the item title or headline shown on the card.
- Subtitle a secondary line such as a vendor, author, or category, when present.
- Price the listed price or value, when the card shows one.
- Link the absolute URL to the individual item page.
- Result count a summary count of how many items the page reports.
Why a plain request misses scrolled content
Send a bare HTTP request to an infinite-scroll page and you get back the markup the server returned before any scrolling happened. That is usually the first batch of items and nothing more. The rest of the list is fetched by the page's own JavaScript in response to scroll events, often through background AJAX calls, and stitched into the DOM only after the browser scrolls down. A request that never scrolls never triggers those calls, so the extra items never exist in the HTML you receive.
To get the full list, two things have to happen in one request. The page needs to render in a real browser, and that browser needs to scroll far enough, for long enough, to trigger each round of lazy loading. You can build this yourself with a headless browser that scrolls in a loop and waits between passes, plus a pool of rotating IPs so the target reads the traffic as a real visitor. The Crawling API folds all of that into a single call: you send the URL with the scroll options set, it renders and scrolls the page server-side behind a trusted IP, and it returns the fully loaded HTML for you to parse with Cheerio.
Three things make scrolling work through the Crawling API. The JavaScript token renders the page in a real browser. The scroll option tells the API to scroll the page after loading. The scroll_interval option sets how many seconds it keeps scrolling, up to a maximum of 60; after that, the API captures the loaded page and returns it.
Prerequisites
A few things should be in place before you write any code. None take long.
Basic JavaScript and Node.js. You should be comfortable writing and running a Node script and installing packages with npm. If you are newer to this, our guide to building a web scraper with Node.js covers the fundamentals this tutorial assumes.
Node.js 16 or later. Confirm your version with node --version. If you do not have it, install it from the Node.js website or through a version manager like nvm.
A Crawlbase account and JavaScript token. Sign up, open your dashboard, and copy your JavaScript token. The free tier gives you 1,000 requests with no card, and you only pay for successful requests. JavaScript requests cost more credits than normal ones because they render the page, which matters when you scroll. Treat the token like a password and keep it out of version control.
Set up the project
Create a project folder, initialize it, and install the two libraries the scraper needs.
node --version mkdir scroll-scraper && cd scroll-scraper npm init -y npm install crawlbase cheerio
Two dependencies do the work: crawlbase is the official Node client for the Crawling API, and cheerio parses the returned HTML with a jQuery-style API so you can pull out fields by CSS selector. Create a file named scraper.js in this folder and add the code from the steps below.
Step 1: Fetch the page with scrolling enabled
Start by getting the loaded page. Import the CrawlingAPI class, initialize it with your JavaScript token, and request the target URL with the scroll option turned on. Without a scroll_interval the API defaults to a 10 second scroll, which loads the first extra batch. Checking the status before you parse keeps failures loud instead of silent.
const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); const targetUrl = 'https://example.com/listings'; api .get(targetUrl, { scroll: true }) .then((response) => { if (response.statusCode === 200) { console.log(response.body.length, 'bytes of rendered HTML'); console.log(response.body.slice(0, 500)); } }) .catch((error) => console.error('API request error:', error));
Run the script with node scraper.js. Passing { scroll: true } tells the Crawling API to render the page and scroll it before returning, so the HTML you get back already contains the items that loaded on scroll, not just the first batch. The byte count and the snippet at the top of the body confirm you are receiving a real rendered page and not an empty shell. Since you have not set an interval yet, this defaults to a 10 second scroll, which naturally loads fewer items than a longer one.
That single call just rendered an infinite-scroll page and scrolled it for you, without a headless browser scrolling in a loop or a proxy pool on your side. The Crawling API runs the page in a real browser, keeps scrolling for as long as scroll_interval says, rotates residential IPs server-side, and handles CAPTCHAs, so you get the fully loaded HTML from one request. Try it on a public listing on the free tier, then add your parser.
Step 2: Load more items with scroll_interval
The default 10 second scroll only loads the first extra batch. To load more, set scroll_interval to the number of seconds you want the API to keep scrolling. The maximum is 60 seconds; after that the API captures whatever has loaded and returns it. Each extra second of scrolling gives the page more time to fetch and render additional items.
const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); async function fetchScrolled(url, seconds) { const response = await api.get(url, { scroll: true, scroll_interval: seconds, }); if (response.statusCode === 200) return response.body; console.error(`Request failed: ${response.statusCode}`); return null; } fetchScrolled('https://example.com/listings', 20) .then((html) => html && console.log(html.length, 'bytes after 20s scroll'));
Here the call scrolls for 20 seconds instead of the default 10, so the page has time to load more items, and the returned HTML is larger as a result. Keep your connection open long enough for the scroll to finish: if you scroll for the full 60 seconds, allow up to roughly 90 seconds for the request to complete before the loaded page comes back. Start with a smaller interval and raise it only until the item count stops growing, since longer renders cost more and add little once the list is exhausted.
There is no single correct value for scroll_interval. A short feed may be fully loaded in 10 to 20 seconds, while a long one may keep loading up to the 60 second cap. Bump the interval in steps and compare how many items you parse out; once the count plateaus, a longer scroll only spends credits without adding rows.
Step 3: Parse the loaded items with Cheerio
With the loaded HTML in hand, pass it to Cheerio and walk each item card. Use selectors that match your real target; the ones below assume a generic listing where each item sits in a .listing-item card. Reading each field defensively keeps one missing value from crashing the run.
const cheerio = require('cheerio'); function parseItems(html, baseUrl) { const $ = cheerio.load(html); const result = { resultCount: $('.results-count').text().trim(), items: [], }; $('.listing-item').each((_, element) => { const card = $(element); const title = card.find('.item-title').text().trim(); const subtitle = card.find('.item-subtitle').text().trim(); const price = card.find('.item-price').text().trim(); let link = card.find('a.item-link').attr('href'); if (link && link.startsWith('/')) { link = new URL(link, baseUrl).href; } if (title) { result.items.push({ title, subtitle: subtitle || '', price: price || 'N/A', link: link || '', }); } }); return result; }
The summary count comes from .results-count, and each item lives in a .listing-item card. Inside a card, the title comes from .item-title, the secondary line from .item-subtitle, the price from .item-price, and the link from the a.item-link anchor, resolved to an absolute URL so it works outside the page. Because the items loaded during the scroll are already in the HTML, the same loop catches the first batch and every later one in a single pass.
The class names above are placeholders for a generic page. On a real target, open the live page in your browser's dev tools, scroll until items load, and read the actual class or attribute on each card. Many production sites use generated class names that change without notice, so treat selectors as a starting template, not a contract, and re-inspect when a field comes back empty.
Step 4: Assemble the full script with JSON and CSV export
Now wire the scrolled fetch and the parse into one runnable script, then write the records to disk as both JSON and CSV.
const fs = require('fs'); const { CrawlingAPI } = require('crawlbase'); const cheerio = require('cheerio'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); async function fetchScrolled(url, seconds) { const response = await api.get(url, { scroll: true, scroll_interval: seconds, }); if (response.statusCode === 200) return response.body; console.error(`Request failed: ${response.statusCode}`); return null; } function toCsv(rows) { const headers = ['title', 'subtitle', 'price', 'link']; const escape = (value) => `"${String(value).replace(/"/g, '""')}"`; const lines = [headers.join(',')]; for (const row of rows) { lines.push(headers.map((h) => escape(row[h])).join(',')); } return lines.join('\n'); } async function main() { const url = 'https://example.com/listings'; const html = await fetchScrolled(url, 20); if (!html) return; const data = parseItems(html, url); fs.writeFileSync('items.json', JSON.stringify(data, null, 2)); fs.writeFileSync('items.csv', toCsv(data.items)); console.log(`Saved ${data.items.length} items to JSON and CSV`); } main();
Paste the parseItems function from Step 3 into the same file so main can call it. Run it with node scraper.js and you get two files: items.json with the full structured records, and items.csv ready to open in a spreadsheet. The toCsv helper quotes every field and doubles any embedded quotes, which matters because titles and prices often contain commas. You now have a working scroll scraper in well under 50 lines, and you can drop it into an existing scraper or wrap it in an endpoint later if you want one.
What the output looks like
The JSON file holds the result count plus one object per loaded item, each with the title, subtitle, price, and link.
{ "resultCount": "248 results", "items": [ { "title": "Wireless Headphones", "subtitle": "AudioWorks", "price": "$59.00", "link": "https://example.com/listings/wireless-headphones" }, { "title": "Mechanical Keyboard", "subtitle": "KeyForge", "price": "$89.00", "link": "https://example.com/listings/mechanical-keyboard" } ] }
The CSV mirrors the same item rows with a header line, so it drops straight into Excel, Google Sheets, or any pipeline that reads delimited files.
title,subtitle,price,link "Wireless Headphones","AudioWorks","$59.00","https://example.com/listings/wireless-headphones" "Mechanical Keyboard","KeyForge","$89.00","https://example.com/listings/mechanical-keyboard"
Scaling beyond one scrolled page
Scrolling loads more items on a single URL, but it is not a substitute for pagination, and there is a practical ceiling. The 60 second cap and the page's own loading speed limit how many items one request can surface. For a larger pull, combine both techniques: scroll each page to load its full batch, then move to the next page if the site paginates, and parse each with the same function.
async function scrapeAllPages(baseUrl, maxPages, seconds) { const allItems = []; for (let page = 1; page <= maxPages; page++) { const pageUrl = `${baseUrl}?page=${page}`; const html = await fetchScrolled(pageUrl, seconds); if (!html) break; const { items } = parseItems(html, baseUrl); if (items.length === 0) break; allItems.push(...items); console.log(`Page ${page}: ${items.length} items`); await new Promise((r) => setTimeout(r, 2000)); } return allItems; }
Match the page pattern to your target by checking a real "next page" link in the browser. For high volume you do not have to run these requests one at a time and wait. The async Crawler lets you push many URLs and collect the results as they finish, which suits large scroll-plus-paginate jobs better than a tight serial loop. For more on rendered, script-heavy pages like these, see our guide to crawling JavaScript websites, and for fully background AJAX feeds, scraping data from AJAX websites.
Scraping responsibly
Keep this scoped to public data and run it considerately. Read the target site's Terms of Service and its robots.txt before you point a scroll scraper at it, and treat both as the boundary for what you collect. Stay on public listings, not anything behind a login, and pace your requests so you are not stressing the server: scrolling already holds a render open for many seconds, so add a delay between pages rather than firing them in a tight loop. When the data involves identifiable people, privacy laws such as GDPR and CCPA apply, so avoid assembling profiles of individuals and do not republish personal data tied to someone's identity. For a fuller playbook, see how to scrape websites without getting blocked.
Key takeaways
- Infinite scroll hides data from a plain request. A bare HTTP call returns only the first batch of items; the rest load when the page scrolls, so a request that never scrolls never sees them.
-
The Crawling API scrolls for you. Send the URL with
scroll: trueand it renders the page in a real browser, scrolls it server-side behind a rotating IP, and returns the fully loaded HTML in one call. - scroll_interval controls how much loads. The default is a 10 second scroll; raise the interval up to the 60 second cap to load more items, and allow up to roughly 90 seconds for a full-length request to return.
- Cheerio parses the loaded HTML. Select each item card, read title, subtitle, price, and link defensively, expect generated class names to drift, and export to JSON and CSV.
- Scroll plus paginate for scale. Scroll each page to load its full batch, move to the next page when the site paginates, pace your requests, and reach for the async Crawler for large jobs.
Frequently Asked Questions (FAQs)
What is infinite scroll and why does it break plain scraping?
Infinite scroll is a pattern where a page loads a small set of items first, then fetches and renders more as the user scrolls toward the bottom, usually through background AJAX calls. A plain HTTP request captures the page before any scrolling happens, so it only sees the first batch. To get the rest you have to render the page in a real browser and actually scroll it, which is what the Crawling API does when you pass the scroll options.
What do the scroll and scroll_interval options do?
The scroll option tells the Crawling API to scroll the page after it loads, which triggers the lazy loading that brings in more items. The scroll_interval option sets how many seconds it keeps scrolling, up to a maximum of 60. With scroll on and no interval set, the API defaults to a 10 second scroll. Both require the JavaScript token, since the page has to render in a real browser for scrolling to mean anything.
How long should I set scroll_interval?
Start small and increase it. A short feed may be fully loaded in 10 to 20 seconds, while a long one keeps loading up to the 60 second cap. Raise the interval in steps and compare how many items you parse out; once the count stops growing, a longer scroll only spends extra credits without adding rows. If you scroll for the full 60 seconds, allow up to roughly 90 seconds for the request to complete.
Do I need the JavaScript token to scroll a page?
Yes. Scrolling only makes sense on a page that renders in a real browser, and the JavaScript token is what enables that rendering. A normal token returns the unrendered HTML, where scrolling has no effect and the extra items never load. JavaScript requests use more credits than normal ones because of the render, so factor that in when you plan a large run.
Is scrolling a replacement for pagination?
No. Scrolling loads more items on one URL, but the 60 second cap and the page's loading speed limit how many it can surface in a single request. For a complete dataset on a site that also paginates, combine the two: scroll each page to load its full batch, then move to the next page and parse it with the same function. For large jobs, the async Crawler handles many such requests in parallel.
My selectors return empty values after scrolling. What is wrong?
Usually one of two things. Either the page did not scroll long enough, so the items you want never loaded, in which case raise scroll_interval and confirm the returned HTML grew. Or the site's markup uses different or generated class names than your selectors expect. Open the live page in your browser's dev tools, scroll until items appear, read the real class or attribute on each card, and update the selectors in parseItems to match.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
