A lot of the data worth collecting never shows up in the raw HTML. Product grids, comment threads, infinite-scroll feeds, and dashboard widgets all arrive after JavaScript runs in the browser, so a plain HTTP request hands you a skeleton with the interesting parts missing. The classic answer is to render the page in a real browser, then parse the finished markup, and the most common pairing for that in Python is Selenium plus BeautifulSoup.
This guide shows you how to scrape dynamic content with Selenium and BeautifulSoup: Selenium drives a headless Chrome instance to execute the JavaScript, wait for elements, scroll to trigger lazy loading, and click through interactions, while BeautifulSoup parses the rendered page_source into clean structured data. We will build a small, runnable example, then look honestly at where this stack gets expensive and where a server-side rendering API is the lighter choice.
Why a plain request misses dynamic content
When a page is server-rendered, the HTML you download already contains the data. When it is client-rendered, the server sends a near-empty shell plus a bundle of JavaScript; the browser runs that JavaScript, calls back to an API, and injects the real content into the DOM afterward. A library like requests only ever sees the first response, so it never sees the content that JavaScript adds later.
That is the whole problem dynamic scraping solves: you need something that actually executes the page's JavaScript before you read the DOM. Selenium does exactly that by automating a real browser. Once the browser has rendered everything, the resulting HTML is just HTML, and BeautifulSoup is the fast, ergonomic way to pull fields out of it. The two tools split the work cleanly: Selenium handles interaction and rendering, BeautifulSoup handles extraction.
BeautifulSoup does not run JavaScript. On its own it parses whatever HTML you feed it, so feeding it the raw response of a client-rendered page gives you the same empty shell a plain fetch returns. The rendering step has to happen first, whether that is a local browser via Selenium or a rendering API that returns finished HTML.
Set up the environment
You need Python 3.8 or later and pip. Create a virtual environment so the dependencies stay isolated, then install Selenium and BeautifulSoup.
python -m venv scraper_env source scraper_env/bin/activate pip install selenium beautifulsoup4
On Windows, activate with scraper_env\Scripts\activate instead of the source line. You do not need to download a driver binary manually: since Selenium 4.10, Selenium Manager resolves and downloads the matching ChromeDriver automatically the first time you launch Chrome, so a current Chrome install plus the selenium package is enough to get started.
Step 1: Launch a headless Chrome WebDriver
Start by configuring Chrome to run headless, which means no visible window. Headless mode is faster and is what you want on a server, though running headed during development makes debugging selectors much easier. A couple of extra flags keep the browser stable in containers and reduce the surface that simple bot checks key on.
from selenium import webdriver from selenium.webdriver.chrome.options import Options def build_driver(): options = Options() options.add_argument("--headless=new") options.add_argument("--no-sandbox") options.add_argument("--disable-dev-shm-usage") options.add_argument("--window-size=1920,1080") return webdriver.Chrome(options=options)
A fixed window size matters more than it looks: many sites render different layouts at different viewport widths, so pinning the size keeps your selectors stable across runs. To debug what the browser actually sees, drop the --headless=new line and watch the page load live.
Step 2: Navigate and wait for elements explicitly
The single biggest mistake in dynamic scraping is reading the DOM before the content has arrived. A fixed time.sleep() is the wrong fix: too short and you miss data, too long and every run is slow. The right tool is an explicit wait, which polls the page until a specific condition is true (or a timeout fires), then returns immediately once it is. Selenium ships this as WebDriverWait paired with expected_conditions.
from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC def load_page(driver, url, wait_for): driver.get(url) WebDriverWait(driver, 15).until( EC.presence_of_element_located((By.CSS_SELECTOR, wait_for)) )
Here load_page navigates to the URL and blocks only until at least one element matching wait_for appears, up to 15 seconds. Use visibility_of_element_located when you also need the element painted (not just present in the DOM), and element_to_be_clickable before you click something. Tying your wait to the element you actually care about is what makes a Selenium run both fast and reliable.
Step 3: Scroll to trigger lazy loading
Many feeds and product grids load more items only as you scroll. To capture all of them you have to drive the scroll yourself, then wait for the new batch to render before scrolling again. The pattern is a loop: scroll to the bottom, wait, measure the page height, and stop when it stops growing.
import time def scroll_to_bottom(driver, pause=2.0, max_rounds=10): last_height = driver.execute_script("return document.body.scrollHeight") for _ in range(max_rounds): driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(pause) new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height
The short sleep here is a pragmatic pause for the next batch to fetch and render; it is the one place a fixed delay is hard to avoid because the trigger is the network, not a known element. Cap the loop with max_rounds so a page that never stops growing cannot run forever. If the site uses a "Load more" button instead of infinite scroll, the equivalent is to find the button, click it, wait for new rows, and repeat until the button disappears.
Step 4: Hand the rendered HTML to BeautifulSoup
Once the page is rendered and fully scrolled, the rest is ordinary parsing. Read driver.page_source, which is the live DOM serialized to HTML, and load it into BeautifulSoup. From there you select elements by CSS selector exactly as you would with any static page.
from bs4 import BeautifulSoup def parse_items(html): soup = BeautifulSoup(html, "html.parser") items = [] for card in soup.select("div.product-card"): title = card.select_one("h2.title") price = card.select_one("span.price") items.append({ "title": title.get_text(strip=True) if title else None, "price": price.get_text(strip=True) if price else None, }) return items
The guards around each field (title.get_text(...) if title else None) keep one missing element from crashing the whole run, which is worth doing from the start because real listings are inconsistent. You could query elements through Selenium's own find_elements instead, but BeautifulSoup is faster for bulk extraction and its selector and navigation API is friendlier once the DOM is settled.
Step 5: Put the pieces together
Wire the four steps into one script: build the driver, load and wait, scroll, parse, then always quit the driver so you do not leak browser processes.
import json def main(): url = "https://example.com/products" driver = build_driver() try: load_page(driver, url, "div.product-card") scroll_to_bottom(driver) data = parse_items(driver.page_source) finally: driver.quit() print(json.dumps(data, indent=2)) if __name__ == "__main__": main()
The try/finally is not optional in production. A scroll loop or a wait can raise, and if driver.quit() never runs you leave a zombie Chrome process behind on every failure. Over a long job that exhausts memory fast. For deeper coverage of running browsers this way, see headless browser for web scraping, and for the broader landscape of rendering JavaScript with Python, how to scrape JavaScript pages with Python.
The honest costs of the Selenium approach
This stack works, and for a one-off scrape of a handful of pages it is hard to beat. At scale, the costs add up, and it is worth naming them before you commit to running a browser fleet.
- Browser overhead. Every page spins up a full Chrome instance with its memory and CPU footprint. A dozen concurrent drivers can saturate a small server, so throughput is bounded by hardware, not just network.
-
Flaky waits. Explicit waits are far better than
sleep, but they still break when a site changes its markup or its timing, and a wait that passes locally can time out on a slower machine. Wait logic becomes ongoing maintenance. - Anti-bot detection. A headless browser still leaks signals (driver fingerprints, missing headers, datacenter IPs) that modern defenses read. At volume you will hit CAPTCHAs and IP blocks that no amount of waiting fixes, which means layering in proxy rotation and fingerprint patches on top of everything above.
None of this makes Selenium the wrong tool; it makes it a heavy tool. When the job is "render a lot of pages reliably from a server without babysitting a browser fleet," the rendering and the unblocking are the hard parts, and those are exactly what a managed API can take off your plate.
If you want rendered HTML without running a browser fleet, the Crawling API renders the page in a real browser server-side and rotates residential IPs for you, then returns finished HTML you parse with the same BeautifulSoup code. You pass a JS token and wait options instead of managing drivers, scroll loops, and a proxy pool. Start on the free tier and point it at one dynamic page first.
The lighter alternative: render server-side, parse locally
The Crawling API keeps the half of this workflow you actually like (BeautifulSoup extraction) and removes the half that hurts (running and unblocking browsers). You send the URL with a JavaScript token, the API renders it behind a trusted IP, and you parse the returned HTML exactly as before. The same parse_items function from Step 4 works unchanged.
from crawlbase import CrawlingAPI from bs4 import BeautifulSoup import json api = CrawlingAPI({"token": "YOUR_CRAWLBASE_JS_TOKEN"}) def fetch_rendered(url): options = {"ajax_wait": "true", "page_wait": 5000} response = api.get(url, options) if response["status_code"] == 200: return response["body"].decode("utf-8") print(f"Request failed: {response['status_code']}") return None html = fetch_rendered("https://example.com/products") if html: print(json.dumps(parse_items(html), indent=2))
The ajax_wait option tells the API to wait for asynchronous content to settle, and page_wait holds for a fixed number of milliseconds after load so late-rendering elements appear before capture. Raise page_wait if fields come back empty. Notice what is gone: no driver lifecycle, no scroll loop, no try/finally cleanup, and no proxy management. The rendering and the IP rotation happen server-side, so a thousand pages is a thousand HTTP calls rather than a thousand browser launches.
If your task needs genuine multi-step interaction (logging in, filling and submitting a form, clicking through a wizard, then reading state that depends on those actions), Selenium's stateful browser session is the right tool. The Crawling API shines when the goal is "render this URL and give me the finished HTML" at volume. Many projects use both: a browser for the few interactive flows, the API for the bulk fetching.
For other browser-automation stacks worth comparing, Playwright for web scraping covers a modern alternative to Selenium with similar trade-offs. If you would rather route your own browser traffic through rotating IPs instead of using the managed API, the Smart AI Proxy gives you residential rotation as a drop-in proxy endpoint, and for fire-and-forget jobs the asynchronous Crawler pushes rendered results to a callback instead of blocking on each request.
Key takeaways
- Dynamic content needs rendering. JavaScript injects the data after the first response, so you must execute the page before you parse it; a plain fetch returns a shell.
-
Selenium renders, BeautifulSoup extracts. Drive a headless Chrome WebDriver to render and interact, then hand
driver.page_sourceto BeautifulSoup for fast, selector-based extraction. -
Use explicit waits, not sleeps.
WebDriverWaitwithexpected_conditionstied to the element you need is both faster and more reliable than a fixed delay. - Scroll to trigger lazy loading. Loop scroll-wait-measure until the page height stops growing, with a round cap so it cannot run forever.
- Selenium is heavy at scale. Browser overhead, flaky waits, and anti-bot blocking add up; a rendering API like the Crawling API returns finished HTML with IP rotation handled, and your BeautifulSoup code stays the same.
Frequently Asked Questions (FAQs)
Why can BeautifulSoup not scrape dynamic content on its own?
BeautifulSoup is a parser, not a browser: it reads HTML you give it but never executes JavaScript. On a client-rendered page the raw HTML is an empty shell, so BeautifulSoup has nothing to extract until something renders the page first. That rendering step is what Selenium provides locally, or what a JS-token rendering API provides server-side, before BeautifulSoup ever runs.
How do I wait for dynamic elements instead of guessing with sleep?
Use an explicit wait. WebDriverWait(driver, timeout).until(EC.presence_of_element_located((By.CSS_SELECTOR, sel))) polls the page and returns the instant the element appears, up to the timeout. That is faster than a fixed time.sleep() because it does not wait longer than necessary, and more reliable because it is tied to the actual element you need rather than a guessed duration.
Do I still need to download ChromeDriver manually?
Not since Selenium 4.10. Selenium Manager resolves and downloads the ChromeDriver version that matches your installed Chrome automatically the first time you launch the browser. You only manage the driver by hand in locked-down environments where automatic downloads are blocked, in which case you point Selenium at a driver binary you provide.
Will Selenium get me past anti-bot systems?
Not on its own at scale. A headless browser still exposes signals (automation fingerprints, datacenter IPs, missing or inconsistent headers) that modern defenses detect, so you will hit CAPTCHAs and IP blocks once volume rises. Mitigating that means layering in proxy rotation and fingerprint patches, which is why a managed Crawling API that handles rendering and IP rotation together is often less work than hardening a browser fleet.
When should I use the Crawling API instead of Selenium and BeautifulSoup?
Use the Crawling API when the job is rendering many pages reliably from a server and you do not want to run or unblock a browser fleet. It renders server-side, rotates residential IPs, and returns finished HTML that your existing BeautifulSoup code parses unchanged. Keep Selenium when you need genuine stateful interaction such as logging in, submitting forms, or clicking through a multi-step flow.
Can I reuse my BeautifulSoup parsing code with the Crawling API?
Yes, and that is the main appeal. The Crawling API returns the rendered HTML as a string, so the same BeautifulSoup(html, "html.parser") call and the same selectors work without changes. You only swap out how the HTML is obtained: instead of driver.page_source from a local browser, you read response["body"] from the API call.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
