Direct Answer: Scraping Google’s People Also Ask (PAA) feature is a dynamic SERP box showing expandable question-and-answer pairs related to a search query, requires JavaScript rendering, HTML parsing, and structured extraction. Using the Crawlbase Crawling API (a web crawling solution that handles headless browsing, proxy rotation, and anti-bot logic), you can reliably collect PAA questions, answers, and nested expansions, then output clean JSON for SEO analysis, content gap discovery, and topic clustering across different markets.
Google’s People Also Ask (PAA) box appears in roughly 40 to 45 percent of Google searches, making it one of the most consistent sources of user intent outside of organic results.
For SEO practitioners, PAA data is especially valuable because it exposes:
- Real user intent behind a keyword
- Content gaps that competitors have not covered
- FAQ and topic cluster opportunities
- Featured snippet targets
This guide walks through how to scrape Google People Also Ask programmatically using the Crawlbase Crawling API. You’ll extract questions, answers, and nested expansions, then use that data for content gap analysis, FAQ generation, and topic clustering across different markets.
The full working code is available in the ScraperHub repository.
Definition
PAA expansion tree: When a user clicks a question, Google loads 2-4 additional related questions. This creates a cascading structure. Most scraping setups capture only the first 3 to 4 visible items and miss everything beyond that initial layer.
How Do You Scrape Google’s People Also Ask
At a high level, scraping PAA requires rendering the page, not just requesting it.
A simple HTTP request is not enough because PAA content loads after the page initializes and updates dynamically on interaction.
To extract it reliably:
- Send a Google search URL with gl and hl parameters to a rendering API
- Wait for JavaScript execution, typically around 2000 ms
- Parse the returned HTML using fallback selectors
- Structure the output into JSON
If you skip the rendering step, the PAA section will either be incomplete or missing entirely.

What to Extract: Google’s PAA Data Structure
Once you have the rendered HTML, the next step is structuring the data in a way that is actually usable.
A complete PAA record typically looks like this:
1 | { |
Each field serves a specific purpose:
- question: expands keyword coverage and topic discovery
- answer: helps with featured snippet optimization
- source_url: supports competitor analysis
- children: captures deeper levels of the expansion tree
Another way to think about it is that each question becomes a node, and each expansion adds more nodes beneath it.
Most scrapers stop at the first layer. That leaves a large portion of available data untouched.
Why Use Crawlbase for Google’s PAA Extraction
At this point, the main challenge is not parsing. It’s getting reliable, fully rendered HTML from Google.
Crawlbase simplifies that entire process. Instead of managing headless browsers, proxies, and retry logic, you work with a single API endpoint that handles those layers for you.
The Crawling API uses one base URL:
1 | https://api.crawlbase.com |
You only need two required parameters:
tokenurl
For Google SERPs, you should use your JavaScript token and include page_wait so the PAA section has time to load. A timeout of at least 90 seconds is recommended for stability.
Here is a sample request:
1 | import requests |
This single request already returns fully rendered HTML, including the PAA section. From there, you can pass the response directly into your parser.
This replaces an entire stack that would otherwise include browser automation tools, proxy rotation systems, and custom anti-block handling. That simplicity is what makes it practical to scale PAA extraction beyond a handful of queries.
How Do You Run a Complete Google’s PAA Scraper
Now that the pieces are clear, the fastest way to get started is not to build everything manually, but to use a complete implementation.
The ScraperHub repository already includes a working pipeline for fetching, parsing, and exporting PAA data. You can clone it and run it locally in a few minutes.
Step 1: Clone the Scraper
Go to the repository: ScraperHub/How-to-scrape-google-PAA
Clone it:
1 | git clone https://github.com/ScraperHub/how-to-scrape-google-people-also-ask.git |
Step 2: Understand How the Scraper Works
Before running it, it helps to know how the pieces fit together.
- main.py builds the search URL, runs the pipeline, and writes JSON
- config.py manages tokens, retries, and timeouts
- fetcher.py handles requests to Crawlbase
- parser.py extracts PAA data using fallback selectors

Each file does one job. Together, they form a complete scraping pipeline.
Step 3: Set Up the Environment
Make sure the latest Python version is installed, then set up your environment:
1 | python3 -m venv .venv |
Set your Crawlbase tokens:
1 | export CRAWLBASE_TOKEN=your_normal_token |
The JavaScript token or Browser Enabled API Key is required for Google SERPs.
Step 4: Run the Scraper
1 | python main.py "how to scrape google" |
This runs the full flow:
- Builds the Google SERP URL
- Fetches rendered HTML
- Parses PAA questions and answers
- Outputs structured JSON
Step 5: Customize Your Runs
You can adjust parameters directly from the CLI.
Change country:
1 | python main.py "content gap analysis" --country uk -o paa_uk.json |
Adjust rendering time:
1 | python main.py "web scraping best practices" --page-wait 3000 |
If results look incomplete, increasing page_wait (value in milliseconds) is usually the first fix.
Step 6: Test the Scraper
Run the test suite:
1 | python3 run_tests.py |
Or, if you’re using pytest:
1 | python3 -m pytest tests/ -v |
These tests use saved Google SERP HTML to verify that your parser still extracts questions, answers, and source URLs correctly. It’s a quick way to catch breakages when Google changes its page structure before running large scraping jobs.
Capturing Nested Expansions in Google PAA
Up to this point, you are extracting the initial set of PAA questions. That alone gives you a useful dataset, but it’s still incomplete, as the real value comes from going deeper into the expansion tree.
When you expand a PAA question, Google dynamically loads additional related questions. Each of those can trigger further expansions, creating a layered structure of queries.
To capture this behavior, you use the css_click_selector parameter in the Crawling API. This allows you to simulate clicks on PAA elements so the additional questions load before parsing.
The flow works like this:
- Build the SERP URL with your query and geo parameters
- Fetch the rendered HTML using the Crawling API
- Parse the initial PAA set
- Trigger expansions using
css_click_selector - Re-fetch or re-parse the updated DOM
- Output the full dataset
Each expansion adds another layer to your data. In practice, a single query can grow from 3 to 4 visible questions to 12 to 20 total questions after a few expansion levels.
This step is optional from an implementation standpoint, but it’s where most of the missing value lives.
How Do You Compare Google PAA Across Countries
PAA results are not universal. They vary by location and language.
To compare them, run the same query with different gl values:
1 | queries = [ |
Compare:
- Unique questions
- Overlapping topics
- Differences in answers
This is particularly useful when expanding into new regions or localizing content.
When Should You Use the Enterprise Crawler?
The standard Crawling API works well for small batches where you fetch results immediately. Once you scale to thousands or even millions of queries, it becomes harder to manage.
The Enterprise Crawler is built for that scale. It runs asynchronously, so you can push URLs in bulk and receive results later via a webhook.
You don’t need to rewrite your scraper. Just update the request in fetcher.py:
1 | params["callback"] = True |
To receive results, you’ll need a webhook.
You can either use the Crawlbase Cloud Storage for a quick setup or create your own endpoint if you want full control
If you build your own, it just needs to accept POST requests, be publicly accessible, and return a quick 200–204 response. For local testing, tools like ngrok work well.
Use it when you are building large datasets or running recurring jobs. Check the Crawler documentation to learn more.
Real-World Applications of Google PAA Data
PAA data is directly usable in production workflows because it reflects how users actually phrase their questions.
You can use it to:
- Build FAQ sections based on real queries instead of guessing what users ask
- Identify content gaps by spotting questions your competitors have not answered
- Create topic clusters by grouping related questions into supporting articles
- Improve featured snippet targeting by aligning your answers with how Google already structures responses
What makes this valuable is that it removes guesswork from content planning. You are working with questions that already surface in search, not assumptions.
For example, a SaaS team targeting “web scraping tools” might extract 15 to 20 PAA questions from a single query. Instead of treating those as raw data, they can turn each question into a dedicated FAQ section, a supporting blog post, or even a subsection within a larger guide.
Over time, these questions naturally form a content cluster around the main topic, making it easier to cover the space comprehensively and compete for both rankings and featured snippets.
Conclusion
PAA is one of the most underutilized datasets in search. If you only capture the initial questions, you are missing most of the available insights.
With Crawlbase and the ScraperHub implementation, you can extract the full expansion tree, structure it into usable data, and scale it across different markets without managing browsers, proxies, or infrastructure.
Try this yourself now by creating a Crawlbase account and use the 1,000 free requests to run the scraper on your own queries. It’s a quick way to see how much additional data you can unlock from a single search.
Frequently Asked Questions
What is a People Also Ask box?
A PAA box is a Google SERP feature showing 3-4 expandable question-answer pairs related to the search query. It appears in roughly 43% of searches and expands dynamically when clicked.
Is scraping Google PAA legal?
Scraping publicly available search results exists in a legal grey area. We recommend reviewing Google’s Terms of Service before using scraped data in any application. Crawlbase provides the tools to crawl and extract publicly accessible data, but how that data is used is ultimately your responsibility.
How many PAA questions can one query return?
The initial PAA box shows 3-4 questions. Each expansion adds 2-4 more. A 3-level deep expansion tree typically yields 12-20 total questions per query.
Why does PAA vary by location?
Google personalises PAA results based on the searcher’s country and language settings. The same query in the US and UK often returns different questions because user behaviour, language patterns, and available content differ by market.
What happens when Google changes its HTML selectors?
Your parser will silently return empty results. Use layered fallback selectors, log which selector fires on each run, and set up a monitoring alert if the results count drops below a threshold.
How often does Google update PAA for a given keyword?
PAA sets are relatively stable for informational queries (weeks to months) but can shift within hours for trending or news-adjacent topics. For monitoring use cases, a weekly crawl cadence is sufficient for most evergreen keywords












