TikTok is one of the largest sources of public short-video data on the web, and the trends, hashtags, and engagement numbers it surfaces are genuinely useful for market research, content strategy, and trend tracking. It is also one of the harder surfaces to read programmatically: pages render client-side with JavaScript, the platform challenges automated traffic, and a plain HTTP request to a search or profile URL usually returns a near-empty shell with none of the data you can see in a browser.

This guide shows you how to scrape public TikTok data with Python in a way that actually works, while staying strictly inside what is public and aggregate. Everything here is scoped to public search results, public hashtag feeds, and public profile videos: captions, like and comment and share counts, video URLs, and posted dates. It does not cover anything behind a login, private accounts, or the personal data of individual people. Read the legality section near the end before you point this at anything real, and for production use prefer TikTok's official API.

What you will build

A small Python script that takes a public TikTok search or hashtag URL, fetches the fully rendered page through the Crawling API with a JavaScript token, and parses a handful of public, aggregate fields from each video card:

  • Caption the public text shown on the video card.
  • Like, comment, and share counts the aggregate engagement numbers a card displays, not the people behind them.
  • Video URL the public permalink to each video.
  • Posted date the upload date shown on the card.
  • Hashtags the public tags attached to each video.

Notice what is deliberately absent: no follower lists, no commenter identities, no private-account content, no contact details. Those are personal data of individuals, and collecting them is out of scope here on purpose. We treat usernames as incidental context for a public video, not as a profile to enrich.

Why a plain request fails on TikTok

Request a TikTok search or hashtag URL with a bare HTTP client and you will get a response that is technically successful and practically useless. The body is a JavaScript shell: the real content, the video cards, captions, and counts, only appears after the page's scripts run in a browser and fetch data from internal endpoints. On top of that, TikTok flags automated traffic quickly. Datacenter IP ranges, missing browser behavior, and repetitive request patterns get challenged or rate-limited well before the interesting content ever loads.

So a working TikTok scraper needs two things in the same request: a real browser that renders the page, and an IP address the platform reads as an ordinary visitor. You can build that yourself with a headless browser and a pool of rotating residential proxies, but keeping that stack healthy is most of the work. The Crawling API folds both into one call. You send it a URL with a JavaScript token, it renders the page behind a trusted residential IP, and it returns finished HTML you can parse. If you want the deeper background, see our guide on how to crawl JavaScript websites.

Why the JS token

Crawlbase offers two token types. The normal token fetches static HTML; the JavaScript (JS) token renders the page in a real browser first. TikTok is client-side rendered, so you need the JS token here. The normal token returns the same shell a plain fetch would, with nothing useful to parse out of it.

Prerequisites

A few things to have in place first. None take long.

Basic Python. You should be comfortable running a script and installing packages with pip. If you are new to parsing HTML, our primer on how to use BeautifulSoup in Python covers the extraction side.

Python 3.8 or later. Confirm with python --version and pip --version. If you do not have it, install it from python.org.

A Crawlbase account and JS token. Sign up, open your dashboard, and copy your JavaScript (JS) token. The first 1,000 requests are free, no credit card required. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create an isolated virtual environment, then install the libraries the scraper needs.

bash
python --version

python -m venv tiktok_env
source tiktok_env/bin/activate

pip install crawlbase beautifulsoup4

On Windows, activate with tiktok_env\Scripts\activate instead of the source line. Two dependencies do the work: crawlbase is the official client for the Crawling API, and beautifulsoup4 parses the returned HTML so you can pull out individual fields by selector.

Step 1: Fetch the rendered page

Start by getting the finished page. Import CrawlingAPI, initialize it with your JS token, and request a public search or hashtag URL. TikTok loads content asynchronously, so pass ajax_wait and page_wait to let the page settle before it is captured. Check the status before parsing so failures stay loud instead of silent.

python
from crawlbase import CrawlingAPI
import urllib.parse

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

options = {
    "ajax_wait": "true",
    "page_wait": 10000,
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
}

def crawl(page_url):
    response = api.get(page_url, options)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed. Crawlbase status: {response['headers']['pc_status']}")
    return None

if __name__ == "__main__":
    query = urllib.parse.quote("cooking recipes")
    url = f"https://www.tiktok.com/search/video?q={query}"
    html = crawl(url)
    print(html[:500] if html else "No HTML returned")

The wait options matter for a client-rendered target. ajax_wait tells the API to wait for asynchronous content to finish loading, and page_wait holds for a fixed number of milliseconds after load so late-rendering cards appear before the page is captured. Ten seconds is a reasonable starting point for TikTok; raise it if cards come back empty. The example queries a topic (cooking recipes) precisely because it is impersonal and public. Run the script and you should see real page markup, which confirms rendering works before you write a single selector.

Crawlbase Crawling API

TikTok needs a rendered page behind a trusted IP, in one call. The Crawling API takes a JS token, runs the page in a real browser so ajax_wait and page_wait have something to wait for, rotates through residential IPs server-side, and hands you finished HTML, so you skip running a headless browser fleet and a proxy pool yourself. Point it at a public search query on the free tier first.

Step 2: Find the video cards in the search listing

With rendered HTML in hand, load it into BeautifulSoup and locate the search listing, the container that holds every result on the page. TikTok marks key elements with data-e2e attributes, which are far more stable than its deeply nested, frequently renamed CSS classes. The search results live under div[data-e2e='search_video-item-list'], and each direct child is one video card.

python
from bs4 import BeautifulSoup

def find_video_cards(html):
    soup = BeautifulSoup(html, "html.parser")
    return soup.select("div[data-e2e='search_video-item-list'] > div")

This returns a list of card elements. Each card is self-contained: the caption, the engagement counts, the video link, the posted date, and the hashtags all live inside it, so the rest of the parsing operates on one card at a time.

Step 3: Parse the public video fields

From each card, pull the public, aggregate fields. The caption sits under data-e2e='search-card-video-caption', the video link under data-e2e='search_video-item', the posted date in an element whose class contains DivTimeTag, and the engagement count under data-e2e='search-card-like-container'. Every selector below is wrapped so a missing element returns None instead of crashing the run, since TikTok does not render every field on every card.

python
def text_of(card, selector):
    el = card.select_one(selector)
    return el.text.strip() if el else None

def scrape_video_details(card):
    link = card.select_one("div[data-e2e='search_video-item'] a")
    return {
        "caption": text_of(card, "div[data-e2e='search-card-video-caption'] > div > span"),
        "video_url": link["href"].strip() if link and link.has_attr("href") else None,
        "posted_date": text_of(card, "div[class*='DivTimeTag']"),
        "like_count": text_of(card, "div[data-e2e='search-card-like-container'] > strong"),
    }

This pulls only aggregate, non-personal fields: the caption text, the public video URL, the posted date, and the public like count. Like and comment and share counts are numbers; the people behind them are not yours to harvest. We do not read individual comments or who liked the video, and that restraint is what keeps the work defensible.

Selectors drift

TikTok changes its markup without notice, which is why this code leans on data-e2e attributes rather than brittle nested classes. When a field comes back as None, re-inspect the live page in your browser's dev tools and update the selector. Periodic maintenance is normal for any production scraper, not a sign something is broken.

Step 4: Parse the hashtags

Hashtags are public tags that describe a video's topic, which makes them the most useful aggregate signal on the page for trend work. They sit in their own anchors under data-e2e='search-common-link'. Collect them into a list per card.

python
def scrape_hashtags(card):
    tags = card.select("a[data-e2e='search-common-link'] > strong")
    return {"hashtags": [t.text.strip() for t in tags]}

Aggregating hashtags across a search or a hashtag feed gives you a topic distribution without touching a single identifiable individual. That is the kind of analysis this approach is built for: counts, trends, and co-occurrence rather than profiles.

Step 5: Put it together

Now wire fetch and parse into one runnable script. It fetches a public search page, walks each video card, merges the video fields with the hashtags, and prints a clean JSON list.

python
from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import urllib.parse
import json

api = CrawlingAPI({"token": "YOUR_CRAWLBASE_TOKEN"})

options = {
    "ajax_wait": "true",
    "page_wait": 10000,
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
}

def crawl(page_url):
    response = api.get(page_url, options)
    if response["headers"]["pc_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed. Crawlbase status: {response['headers']['pc_status']}")
    return None

def text_of(card, selector):
    el = card.select_one(selector)
    return el.text.strip() if el else None

def scrape_video_details(card):
    link = card.select_one("div[data-e2e='search_video-item'] a")
    return {
        "caption": text_of(card, "div[data-e2e='search-card-video-caption'] > div > span"),
        "video_url": link["href"].strip() if link and link.has_attr("href") else None,
        "posted_date": text_of(card, "div[class*='DivTimeTag']"),
        "like_count": text_of(card, "div[data-e2e='search-card-like-container'] > strong"),
    }

def scrape_hashtags(card):
    tags = card.select("a[data-e2e='search-common-link'] > strong")
    return {"hashtags": [t.text.strip() for t in tags]}

def scrape_search(url):
    html = crawl(url)
    if not html:
        return []
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select("div[data-e2e='search_video-item-list'] > div")

    results = []
    for card in cards:
        video = scrape_video_details(card)
        video.update(scrape_hashtags(card))
        results.append(video)
    return results

def main():
    query = urllib.parse.quote("cooking recipes")
    url = f"https://www.tiktok.com/search/video?q={query}"
    results = scrape_search(url)
    print(json.dumps(results, indent=2, ensure_ascii=False))

if __name__ == "__main__":
    main()

The same script works for a public hashtag feed: swap the search URL for a hashtag URL such as https://www.tiktok.com/tag/cooking and adjust the card selector if the page structure differs. The shape of the output stays the same, which is the point of keeping the parsing on one card at a time.

What the output looks like

Run the full script and you get a clean record of public fields per video, ready to write to JSON, CSV, or a database.

json
[
  {
    "caption": "Crispy potato snacks recipe",
    "video_url": "https://www.tiktok.com/@artofcooking.example/video/7344763014572182789",
    "posted_date": "3-10",
    "like_count": "8.7M",
    "hashtags": ["#potatosnacks", "#snacks", "#foryou", "#fyp"]
  },
  {
    "caption": "Crispy potato bread rolls",
    "video_url": "https://www.tiktok.com/@recipesoftheworld.example/video/7155082128521186587",
    "posted_date": "2022-10-16",
    "like_count": "6.6M",
    "hashtags": ["#breadroll", "#snacks", "#foodie", "#streetfood"]
  }
]

The counts arrive as display strings like 8.7M rather than raw integers, because that is what TikTok renders. If you need them as numbers for aggregation, normalize them in a small post-processing step (expand the K and M suffixes) before you store them.

Handling pagination

TikTok uses scroll-based pagination: new cards load as the user scrolls down rather than across numbered pages. The Crawling API can simulate that scrolling for you. Add scroll set to true, and optionally scroll_interval to control how long it waits between scrolls (in milliseconds). That loads more cards into the HTML before it is returned, so a single request yields a deeper result set.

python
options = {
    "ajax_wait": "true",
    "page_wait": 10000,
    "scroll": "true",
    "scroll_interval": 10000,
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
}

Keep scroll_interval generous. Aggressive scrolling on a heavily defended target is the fastest way to trip a rate limit. Pull a reasonable sample and stop rather than trying to scroll an entire feed in one run.

Saving to CSV

Once you have the records, writing them to CSV makes them easy to load into a spreadsheet or a notebook for aggregate analysis. Flatten the hashtags list into a single delimited string so each row stays one line.

python
import csv

def save_to_csv(rows, filename):
    fieldnames = ["caption", "video_url", "posted_date", "like_count", "hashtags"]
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for row in rows:
            row = {**row, "hashtags": " ".join(row.get("hashtags", []))}
            writer.writerow(row)

Call save_to_csv(results, "tiktok_data.csv") with the list from the main script. You now have a tidy table of public video metadata you can analyze for trends without storing anything personal.

Staying unblocked

Even with rendering handled by the Crawling API, TikTok watches for scraper-shaped traffic. A few habits keep a run healthy, and they apply to any hard, heavily defended target.

  • Pace your requests. Hammering pages in a tight loop is the fastest way to get throttled. Add real delays between requests and resist the urge to parallelize aggressively.
  • Lean on rotation. A pool of residential IPs spreads requests across many real-user addresses so no single one trips a rate limit. The Crawling API handles this for you; if you build your own stack, this is the part to get right.
  • Read the status codes. A run that starts returning challenges or errors is telling you the current rate or IP tier is no longer enough. Back off rather than pushing harder.
  • Keep volume low and targets varied. Aggregate trend research does not require crawling a hashtag's entire history. Sample what you need and stop.

For the broader playbook, see how to scrape websites without getting blocked. If you only want public engagement text rather than video metadata, our guide on how to scrape TikTok comments covers that surface, and if you would rather pick a ready-made tool, our roundup of the best TikTok scrapers compares the options.

This is the section to read before you write production code. TikTok's Terms of Service restrict automated access and data collection, and scraping can run against those terms regardless of how careful your tooling is. None of the code above changes that; it only makes the technical part work. Read TikTok's Terms of Service and its robots.txt, honor the rate limits those signals imply, and treat both as the boundary for what you collect. Scraping is a legal gray area that turns sharply on what data you take and what you do with it, so when a project is commercial or large, get your own legal advice.

The honest, restrictive rules to hold to. Collect only public, aggregate data: public captions, public like and comment and share counts, public video URLs, posted dates, and hashtags that anyone can see without logging in. Never scrape private accounts, login-walled content, direct messages, or follower lists. Do not build profiles of identifiable individuals: treat usernames, handles, and user-written comments as personal data, aggregate where you can (counts, trends, hashtag distributions), and do not republish a person's content tied to their identity. When personal data is involved, privacy laws such as the GDPR and the CCPA apply: you need a lawful basis to process it and must honor deletion requests. Those are bright lines, and this guide stays on the aggregate, public side of all of them by design.

For any real or commercial use, the right tool is the official TikTok API. TikTok offers developer APIs for sanctioned access to content and metrics, with structure you can rely on and terms you can stay inside. This article is a technical walkthrough scoped narrowly to public, aggregate data. It is not an endorsement of mass personal-data collection, and it does not cover anything behind a login. If your project needs more than a small sample of public fields, the official API or a formal data agreement is the correct path, not a cleverer scraper.

Recap

Key takeaways

  • TikTok is client-side rendered and bot-defended. A plain request returns an empty shell, so you must render the page before you parse it.
  • Rendering and a trusted IP belong in one call. The Crawling API with a JS token does both; ajax_wait and page_wait control how long it waits for content, and scroll handles TikTok's infinite feed.
  • Parse stable signals. TikTok's data-e2e attributes are far more durable than its frequently renamed nested classes.
  • Public aggregates only. Pull captions, like and comment and share counts, video URLs, posted dates, and hashtags; never private content, follower lists, or profiles of individuals.
  • Pace, rotate, and prefer the official API. Keep volume low, lean on residential rotation, and use TikTok's official API for anything real or commercial.

Frequently Asked Questions (FAQs)

Why does a plain request return no data from TikTok?

Because TikTok renders its search, hashtag, and profile content client-side with JavaScript. The initial HTML is a shell that only fills in after the page's scripts run in a browser, so a raw HTTP request returns a near-empty body. To get real public data you have to render the page first, which is what the Crawling API's JS token handles for you.

Do I need the normal token or the JS token for TikTok?

The JS token. The normal token fetches static HTML, which on TikTok is the same empty shell a plain request returns. The JS token renders the page in a real browser before handing back the HTML, so the public video cards are present when BeautifulSoup parses them.

What TikTok data is safe to scrape?

Only public, aggregate data: public captions, public like and comment and share counts as numbers, public video URLs, posted dates, and hashtags that anyone can see without logging in. Private accounts, login-walled content, direct messages, follower lists, and the identities or content of individual people are off limits. Those are personal data, and collecting them runs against TikTok's terms and, in many places, privacy law.

How do I handle TikTok's infinite scroll?

Set the Crawling API's scroll option to true and tune scroll_interval to control how long it waits between scrolls. The API simulates scrolling down the page so more video cards load into the HTML before it is returned. Keep the interval generous and pull a reasonable sample rather than trying to scroll an entire feed in one request.

Should I use the official TikTok API or scrape the site?

For any real, ongoing, or commercial use, use the official TikTok API. It is the sanctioned route, gives reliable structure, and keeps you inside TikTok's terms. Scraping a small sample of public, aggregate fields with the approach here fits lightweight public-data research where no API access is in place, as long as you respect the terms, robots.txt, and rate limits.

How do I avoid getting blocked while scraping TikTok?

Keep your per-IP request rate low, add real delays between requests, vary your targets instead of crawling one hashtag's full history, and route through rotating residential IPs so no single address trips a rate limit. The Crawling API manages rotation and a trusted IP pool for you. Watch the status codes and back off the moment you start seeing challenges.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.

Self-serve · No sales call required · Enterprise crawl volumes available