Comments on a public TikTok video are a window into how an audience reacts: the language they use, the sentiment they carry, and the themes that keep recurring. Researchers, analysts, and content teams read that signal in aggregate to understand trends, not to track the people writing the comments. This guide shows you how to scrape public TikTok comments with Python in a way that actually works against a JavaScript-rendered page.

To be clear up front: everything here is scoped to public comments on public videos. The goal is aggregate analysis, comment text, like counts, and reply counts you can roll up into sentiment and theme summaries. It is not about building profiles of individual commenters. Usernames and the words people write are personal data, so the whole walkthrough treats them with care, and the legality section near the end covers the rules before you point this at anything real. If you want the broader walkthrough first, see our guide on how to scrape TikTok.

What you will build

A small Python script that takes a public TikTok video URL, fetches the fully rendered page through the Crawling API with a JavaScript token, scrolls to load more comments, and parses a handful of public, mostly aggregate fields:

  • Comment text the visible words of each public comment.
  • Like count the aggregate number of likes a comment shows, not the people behind it.
  • Reply count the aggregate number of replies a comment has drawn.
  • Video metadata the public video URL the comments belong to, for attribution.

Notice what is deliberately absent from the analysis output: no commenter profiles, no follower data, no attempt to link a username to a real identity. Those are personal data of individuals, and harvesting them is out of scope here on purpose. We will read a username off the page because the markup carries it, but the section on privacy explains why you should not store or republish it tied to identity.

Why a plain request fails on TikTok

Request a public TikTok video URL with a bare HTTP client and you get a response that is technically successful and practically empty. TikTok renders its content client-side: the real markup, including comments, only appears after the page's JavaScript runs in a browser and pulls data from internal endpoints. A single static request never executes that JavaScript, so the comments you want are simply not in the body.

On top of that, TikTok loads comments asynchronously and lazily as you scroll, and it flags scraper-shaped traffic quickly. Datacenter IP ranges, missing browser behavior, and repetitive patterns get rate-limited or challenged before the interesting content ever loads. So a working comment scraper needs two things in the same request: a real browser that renders and scrolls the page, and an IP address the platform reads as an ordinary visitor. You can build that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into one call: you send a URL with a JavaScript token, it renders and scrolls behind a trusted residential IP, and it returns finished HTML you can parse. For the underlying mechanics, see how to crawl JavaScript websites.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. TikTok is heavily client-side rendered, so you need the JS token here. The normal token returns the same near-empty shell a plain fetch would, with no comments to parse out of it.

Prerequisites

A few things to have in place first. None take long.

Basic Python, HTML, and CSS. You should be comfortable running a script, installing packages with pip, and reading CSS selectors so you can adapt the comment selectors when TikTok's markup shifts.

Python 3.8 or later. Confirm with python --version. If you do not have it, install it from python.org, and make sure pip is on your PATH.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token from the account docs page. The free tier includes 1,000 requests, which is plenty to follow along. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create an isolated virtual environment, then install the libraries the scraper needs.

bash
python --version

python -m venv tiktok_env
source tiktok_env/bin/activate

pip install crawlbase beautifulsoup4 pandas

On Windows, activate with tiktok_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client for the Crawling API, beautifulsoup4 parses the returned HTML so you can pull fields by selector, and pandas helps you roll the results up for aggregate analysis later.

Step 1: Fetch the rendered video page

Start by getting the finished page. Import CrawlingAPI, initialize it with your JS token, and request a public video URL. Two options matter for a client-rendered target: ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds so late-rendering comments appear before capture. Check the status before parsing so failures stay loud instead of silent.

python
from crawlbase import CrawlingAPI

crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

options = {
    "ajax_wait": "true",
    "page_wait": 10000,
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
}

def fetch_html(url):
    try:
        response = crawling_api.get(url, options)
        if response["headers"]["pc_status"] == "200":
            return response["body"].decode("utf-8")
        print(f"Failed to fetch. Crawlbase status: {response['headers']['pc_status']}")
        return None
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None

if __name__ == "__main__":
    video_url = "https://www.tiktok.com/@nasa/video/7255327059302419738"
    html = fetch_html(video_url)
    print(html[:500] if html else "No HTML returned")

The Crawling API reads pc_status from the response headers, which reports the upstream fetch result independent of the proxy transport. Ten seconds of page_wait is a reasonable starting point for TikTok; raise it if comments come back empty. The example points at a public organization account precisely because it is public and impersonal. Run the script and you should see real rendered markup, which confirms rendering works before you write a single selector.

Crawlbase Crawling API

TikTok needs a rendered, scrolled page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser, scrolls to load lazy comments, and rotates through residential IPs server-side, so you skip running a headless browser fleet and a proxy pool yourself. Point it at one public video on the free tier first.

Step 2: Parse comments into structured data

With rendered HTML in hand, load it into BeautifulSoup and pull the public fields. TikTok marks up its components with stable data-e2e attributes, which are far more durable than chasing deeply nested, frequently renamed CSS class names. The comment list lives inside a comment container; each comment item carries its text, a like count, and a reply count. We will also read the video author username off the page header for attribution context.

python
from bs4 import BeautifulSoup

def text_or_none(node):
    return node.text.strip() if node else None

def scrape_video_info(soup):
    username = soup.select_one("span[data-e2e='browse-username']")
    return {"Video Author": text_or_none(username)}

def scrape_comments_listing(soup):
    return soup.select(
        "div[data-e2e='search-comment-container'] > "
        "div[class*='CommentListContainer'] > "
        "div[class*='DivCommentItemContainer']"
    )

def parse_comment(comment):
    text = comment.select_one(
        "div[class*='DivCommentContentContainer'] "
        "p[data-e2e='comment-level-1'] > span"
    )
    likes = comment.select_one("div[class*='DivLikeContainer'] span")
    replies = comment.select_one("div[class*='DivReplyContainer']")
    return {
        "Comment Text": text_or_none(text),
        "Like Count": text_or_none(likes),
        "Reply Count": text_or_none(replies),
    }

Each helper guards against a missing node so a renamed or absent element returns None instead of raising. The comment listing selector mirrors TikTok's nested structure: a comment container, then a list container, then individual comment items. From each item we pull the comment text, the like count, and the reply count. Those last two are aggregate numbers, exactly the kind of non-personal signal you want for theme and sentiment analysis.

Selectors drift

TikTok changes its markup and obfuscated class names without notice, which is why this code leans on stable data-e2e attributes and partial class*= matches rather than brittle exact classes. When a field comes back as None, re-inspect the live page in your browser's dev tools and update the selector. Periodic maintenance is normal for any production scraper.

Step 3: Handle comment pagination with scroll

TikTok uses infinite scrolling to load more comments dynamically, so a single render only captures the first batch. The Crawling API exposes a scroll parameter that tells the headless browser to scroll the page and load more content before returning. By default the scroll interval is 10 seconds; the scroll_interval parameter lets you extend it so more comment batches load. Add those options to a paginated fetch.

python
def fetch_html_with_scroll(url):
    scroll_options = {
        "ajax_wait": "true",
        "user_agent": options["user_agent"],
        "scroll": "true",
        "scroll_interval": 20000,
    }
    try:
        response = crawling_api.get(url, scroll_options)
        if response["headers"]["pc_status"] == "200":
            return response["body"].decode("utf-8")
        print(f"Failed to fetch. Crawlbase status: {response['headers']['pc_status']}")
        return None
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None

A 20 second scroll_interval gives lazy-loaded comments time to render between scrolls. Longer intervals load more comments but cost more wait time per request, so tune it to how many batches you actually need. Keep volume modest: a representative sample is usually enough for aggregate analysis, and you rarely need every comment on a video.

Step 4: Assemble the full scraper

Now wire fetch, scroll, and parse into one runnable script. It renders the video page with scrolling, reads the public video author for context, parses every loaded comment into text, like count, and reply count, and prints clean JSON you can feed into analysis.

python
import json
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup

crawling_api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

options = {
    "ajax_wait": "true",
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
}

def fetch_html_with_scroll(url):
    scroll_options = {**options, "scroll": "true", "scroll_interval": 20000}
    try:
        response = crawling_api.get(url, scroll_options)
        if response["headers"]["pc_status"] == "200":
            return response["body"].decode("utf-8")
        print(f"Failed to fetch. Crawlbase status: {response['headers']['pc_status']}")
        return None
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return None

def text_or_none(node):
    return node.text.strip() if node else None

def scrape_comments_listing(soup):
    return soup.select(
        "div[data-e2e='search-comment-container'] > "
        "div[class*='CommentListContainer'] > "
        "div[class*='DivCommentItemContainer']"
    )

def parse_comment(comment):
    text = comment.select_one(
        "div[class*='DivCommentContentContainer'] "
        "p[data-e2e='comment-level-1'] > span"
    )
    likes = comment.select_one("div[class*='DivLikeContainer'] span")
    replies = comment.select_one("div[class*='DivReplyContainer']")
    return {
        "Comment Text": text_or_none(text),
        "Like Count": text_or_none(likes),
        "Reply Count": text_or_none(replies),
    }

def main():
    video_url = "https://www.tiktok.com/@nasa/video/7255327059302419738"
    html = fetch_html_with_scroll(video_url)
    if not html:
        return

    soup = BeautifulSoup(html, "html.parser")
    comments = [parse_comment(c) for c in scrape_comments_listing(soup)]

    output = {"Video URL": video_url, "Comments": comments}
    print(json.dumps(output, indent=2, ensure_ascii=False))

if __name__ == "__main__":
    main()

The script keys its output on the video URL rather than on any individual person, which is the right default for aggregate work. Each comment record holds only text and two counts. If you want to persist results, write them to a CSV or database, but read the privacy section first: comment text and usernames are personal data, and how long you keep them and what you do with them is a legal question, not just a technical one.

What the output looks like

Run the full script and you get a clean record of public comment fields, ready to roll up for sentiment or theme analysis.

json
{
  "Video URL": "https://www.tiktok.com/@nasa/video/7255327059302419738",
  "Comments": [
    {
      "Comment Text": "this is incredible",
      "Like Count": "1243",
      "Reply Count": "18"
    },
    {
      "Comment Text": "how was this filmed?",
      "Like Count": "87",
      "Reply Count": "4"
    }
  ]
}

From here, aggregate is the operative word. Group the comment text to surface common themes, run sentiment over the corpus to gauge overall reaction, and weight by like and reply counts to find which sentiments resonated. That tells you how an audience responded without building a dossier on any single commenter. If you plan to feed this into a model, our guide on how to structure and clean web scraped data for AI and ML covers normalising and de-identifying text before training.

Staying unblocked

Even with rendering handled by the Crawling API, TikTok watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard, heavily defended target.

  • Pace your requests. Scrolling renders take longer than static fetches, so do not fire them in a tight loop. Space them out and resist parallelizing aggressively.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you build your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Back off rather than pushing harder.
  • Keep volume low. A representative sample of comments is usually enough for aggregate analysis. You rarely need every comment on a viral video.

For the broader playbook, see how to scrape websites without getting blocked. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same residential rotation as a drop-in proxy endpoint.

This is the section to read before you write production code. Scraping is not inherently illegal, and public comments on a public video are visible to anyone without logging in. But TikTok's Terms of Service restrict automated collection, and comments are personal data: they are content written by identifiable people, often tied to a username. So legality here turns less on whether the data is public and more on what you collect, why, and what you do with it afterward. Read TikTok's Terms of Service and its robots.txt, and treat both as the boundary for what you touch.

If you handle data about people in the EU or the UK, the GDPR applies, and California's CCPA applies to California residents. Both treat usernames and user-written comments as personal data even when public. In practice that means you need a lawful basis to process it, you should minimise what you keep, and you must honor deletion and objection requests. The safest posture for this kind of work is aggregate analysis: derive sentiment, themes, and counts, then discard or de-identify the raw comments and usernames. Do not build profiles of individual commenters, do not republish a person's comment tied to their identity, and do not store usernames linked to opinions you have inferred about them. The script in this guide reads a username because the page exposes it, but you should not retain it tied to identity.

Stay strictly on the public side, and never go past it. Do not scrape private accounts, login-walled content, direct messages, or anything behind a follower gate. Do not bypass authentication or rate limits, and do not redistribute copyrighted video or media. For any real, ongoing, or commercial use, the right tool is the official TikTok API, including the Research API where you qualify. It is the sanctioned path, gives you defined terms and structure, and keeps you inside TikTok's rules. This article is a technical walkthrough scoped to public comments for aggregate analysis, not an endorsement of mass personal-data collection.

Recap

Key takeaways

  • TikTok is client-side rendered and bot-defended. A plain request returns a near-empty shell with no comments, so you must render and scroll the page before you parse it.
  • Rendering, scrolling, and a trusted IP belong in one call. The Crawling API with a JS token does all three; ajax_wait, page_wait, and scroll_interval control how long it waits and loads.
  • Parse stable signals. TikTok's data-e2e attributes and partial class*= matches are more durable than brittle obfuscated class names.
  • Aggregate, do not profile. Pull comment text, like counts, and reply counts for sentiment and theme analysis; never build profiles of individual commenters or store usernames tied to identity.
  • Respect the rules and prefer the official API. TikTok's ToS restricts scraping, GDPR and CCPA treat comments as personal data, and the official TikTok API is the sanctioned route for anything real.

Frequently Asked Questions (FAQs)

Why does a plain request return no comments from TikTok?

Because TikTok renders its content client-side with JavaScript and loads comments lazily as you scroll. The initial HTML is a shell that only fills in after the page's scripts run in a browser, so a raw HTTP request returns a near-empty body. To get real public comments you have to render and scroll the page, which is what the Crawling API's JS token and scroll parameter handle for you.

Do I need the normal token or the JS token for TikTok?

The JS token. The normal token fetches static HTML, which on TikTok is the same empty shell a plain fetch returns. The JS token renders the page in a real browser before handing back the HTML, so the comment elements are present when BeautifulSoup parses them.

How do I load more than the first batch of comments?

Pass scroll: "true" to the Crawling API so the headless browser scrolls the page and triggers TikTok's infinite loading. The scroll_interval parameter, in milliseconds, controls how long it waits between scrolls; a longer interval loads more comment batches at the cost of more wait time per request. Tune it to how many comments you actually need, and keep the volume modest.

What TikTok comment data is safe to collect?

Only public comments on public videos, and ideally only in aggregate: comment text rolled up into themes and sentiment, plus like and reply counts as numbers. Private accounts, login-walled content, direct messages, and any attempt to profile individual commenters are off limits. Usernames and comment text are personal data, so minimise what you keep and de-identify where you can.

Should I use the official TikTok API instead of scraping?

For any real, ongoing, or commercial use, yes. The official TikTok API, including the Research API where you qualify, is the sanctioned route: it gives defined terms, guaranteed structure, and keeps you inside TikTok's rules. Scraping a small sample of public comments fits lightweight, aggregate research where no API access is in place, as long as you respect the terms, robots.txt, rate limits, and privacy law.

How do I avoid getting blocked while scraping TikTok comments?

Keep your per-IP request rate low, space out your scrolling renders instead of looping them tightly, keep volume to a representative sample, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you. Watch the pc_status values and back off the moment you start seeing challenges. For a deeper rundown, see our roundup of the best TikTok scrapers to collect data.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available