How to Scrape Yandex Search Results

Q: How do I paginate through more Yandex results?

Use the p query parameter, which is a zero-based page index: p=0 is the first page, p=1 the second, and so on. Build each page URL with the next index, fetch it through the Crawling API, parse it with the same function, reassign a running position so ranks stay continuous, and pause a couple of seconds between requests so you are pacing the crawl rather than hammering it.

Q: My selectors return nothing. What changed?

Almost certainly Yandex's markup. Class names like organic__url-text and organic__content-wrapper can change when Yandex redeploys its front end, so selectors that worked last month can break. Re-inspect a live results page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Yandex is the search engine most people in Russia reach for first, and it holds a similar pull across several neighboring countries. By most estimates it carries more than half of the Russian search market, which makes its public results a useful signal for anyone tracking Russian-language demand, regional rankings, or how a brand surfaces in a market Google does not lead. The results page exposes the same structured data a SERP tool wants anywhere: titles, links, snippets, and the order they appear in.

This guide shows you how to scrape Yandex search results with Python the reliable way. You build a small, runnable scraper that fetches a rendered results page through the Crawling API, parses each organic result with BeautifulSoup, and exports the data to JSON and CSV. The whole walkthrough stays scoped to public search-results data that anyone can see without an account, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.

What you will build

A Python script that takes a public Yandex search URL, retrieves the HTML through the Crawling API, and extracts a structured record for every organic result on the page. We use the query "Winter Jackets" as the running example and pull these fields from each result:

Title the headline text of the result, as shown in the listing.
URL the destination link the result points to.
Description the snippet shown under the title.
Position the rank of the result on the page, counted from the top.

Why a plain request fails on Yandex

If you fire a bare HTTP request at a Yandex results URL from a script, you rarely get the clean page you see in your own browser. Yandex watches closely for automated traffic. Requests that do not look like a real browser, or that arrive too quickly from one address, get challenged with a verification page (the "Are you a robot?" interstitial) or blocked outright before they reach the listings. Repeating the same IP across many queries trips those checks fast.

So a working Yandex scraper needs two things in one request: an IP the platform reads as a real visitor, and, when the page leans on scripts, a browser that renders it. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping those healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it fetches from a trusted IP and renders when needed, and it returns finished HTML for you to parse.

Why IP rotation matters here

Yandex's anti-bot checks lean heavily on request rate per address. A handful of fast requests from one IP is enough to trigger its verification interstitial. The Crawling API rotates through many addresses server-side, so requests spread across them and no single one trips a limit. You can start with up to 20,000 free requests, no credit card needed.

Prerequisites

You need a few things in place before writing any code. None of them take long.

Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If BeautifulSoup is new to you, our guide to using BeautifulSoup in Python covers the parsing basics this tutorial assumes.

Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.

A Crawlbase account and token. Sign up, open your dashboard, and copy your request token from the account docs page. Crawlbase issues two token types: a Normal token for static pages and a JavaScript token for script-heavy ones. Yandex's organic results come back in the initial HTML, so the Normal token is the right choice here. You get up to 20,000 free requests. Treat the token like a password: it authenticates your requests, so keep it out of version control.

Set up the project

Create a virtual environment so project dependencies stay isolated, then install the three libraries the scraper needs.

bash

python --version

python -m venv yandex_env
source yandex_env/bin/activate

pip install crawlbase beautifulsoup4 pandas

On Windows, activate the environment with yandex_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client that sends your request to the Crawling API, beautifulsoup4 parses the returned HTML so you can pull out fields by CSS selector, and pandas handles the export to CSV at the end.

Step 1: Fetch the page through the Crawling API

Start by getting the HTML. A Yandex search URL is the main domain plus the query in the text parameter, so https://yandex.com/search/?text=Winter%20Jackets searches for "Winter Jackets". Encode the query with urllib.parse.quote so spaces and special characters survive the trip. Write a small fetch_page_html() function that hands the URL to the Crawling API with your token, checks that Yandex itself returned a 200 status, and gives back the decoded HTML body. Checking the status before you parse keeps failures loud instead of silent.

python

from crawlbase import CrawlingAPI
from urllib.parse import quote

API_TOKEN = "YOUR_CRAWLBASE_TOKEN"  # replace with your Normal token
crawling_api = CrawlingAPI({"token": API_TOKEN})

def fetch_page_html(url):
    response = crawling_api.get(url)
    if response["headers"]["cb_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed with Crawlbase status {response['headers']['cb_status']}")
    return None

if __name__ == "__main__":
    url = f"https://yandex.com/search/?text={quote('Winter Jackets')}"
    html = fetch_page_html(url)
    if html:
        print(html[:500])

The crawling_api.get(url) call returns a response whose headers["cb_status"] is the status Yandex itself returned and whose body is the raw page bytes. Guarding on cb_status == "200" means a block or a verification page surfaces as a clean failure instead of feeding garbage into the parser. Decoding the body as UTF-8 is what keeps Cyrillic titles and descriptions readable. Save the file as yandex_scraper.py, run it with python yandex_scraper.py, and you should see real results markup in the first 500 characters, which confirms the fetch works before you write a single selector.

Crawlbase Yandex Scraper

That cb_status (legacy pc_status) check only ever reads 200 because the request reached Yandex looking like a real visitor in the first place, sidestepping the "Are you a robot?" interstitial. The Crawling API fetches the page from a rotating IP, renders it when the page needs a browser, and hands you finished HTML, so you skip running a headless fleet and sourcing a residential proxy pool yourself. Point it at a public results URL on the free tier first.

Start free

Step 2: Parse the results with BeautifulSoup

With HTML in hand, load it into BeautifulSoup and pull each result by its selector. Yandex wraps each organic result in a .serp-item container; inside it, the headline sits in h2.organic__url-text, the destination link in a.organic__url, and the snippet in div.organic__content-wrapper. To confirm these on a live page, open the Yandex results URL in your browser, right-click a result, choose Inspect, and read the class names off the element; the selectors below match the layout at the time of writing.

python

from bs4 import BeautifulSoup

def scrape_yandex_search(html_content):
    soup = BeautifulSoup(html_content, "html.parser")

    search_results = []
    for position, result in enumerate(soup.select(".serp-item"), start=1):
        title_element = result.select_one("h2.organic__url-text")
        url_element = result.select_one("a.organic__url")
        description_element = result.select_one("div.organic__content-wrapper")

        if not title_element or not url_element:
            continue

        search_results.append({
            "position": position,
            "title": title_element.get_text(strip=True),
            "url": url_element["href"],
            "description": description_element.get_text(strip=True) if description_element else None,
        })

    return search_results

Selecting .serp-item gives you one element per result, and enumerate(..., start=1) hands you the position for free as you loop, so rank comes from page order instead of a fragile attribute. Reading the URL from a.organic__url's href keeps the destination separate from the title text. The if not title_element or not url_element: continue guard skips anything that is not a real organic result, which keeps ad blocks and stray markup out of your output. The description falls back to None when its container is absent.

Selectors drift

Yandex revises its front-end markup periodically, and class names like organic__url-text can change with a redeploy. Treat the selectors above as a starting template, not a contract. When a field comes back empty for every result, re-inspect a live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.

Step 3: Put it together

Now wire the fetch and the parse into one runnable script. Crawl the rendered results page, hand the HTML to the parser, and print the structured output as JSON. Setting ensure_ascii=False keeps Cyrillic characters readable in the output instead of escaping them into \u sequences.

python

from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
from urllib.parse import quote
import json

API_TOKEN = "YOUR_CRAWLBASE_TOKEN"
crawling_api = CrawlingAPI({"token": API_TOKEN})

def fetch_page_html(url):
    response = crawling_api.get(url)
    if response["headers"]["cb_status"] == "200":
        return response["body"].decode("utf-8")
    print(f"Request failed with Crawlbase status {response['headers']['cb_status']}")
    return None

def scrape_yandex_search(html_content):
    soup = BeautifulSoup(html_content, "html.parser")
    search_results = []
    for position, result in enumerate(soup.select(".serp-item"), start=1):
        title_element = result.select_one("h2.organic__url-text")
        url_element = result.select_one("a.organic__url")
        description_element = result.select_one("div.organic__content-wrapper")
        if not title_element or not url_element:
            continue
        search_results.append({
            "position": position,
            "title": title_element.get_text(strip=True),
            "url": url_element["href"],
            "description": description_element.get_text(strip=True) if description_element else None,
        })
    return search_results

def main():
    search_query = "Winter Jackets"
    url = f"https://yandex.com/search/?text={quote(search_query)}"
    html_content = fetch_page_html(url)
    if html_content:
        search_results = scrape_yandex_search(html_content)
        print(json.dumps(search_results, ensure_ascii=False, indent=2))

if __name__ == "__main__":
    main()

Run the full script with python yandex_scraper.py. It fetches the results page for "Winter Jackets", extracts a record for each organic listing, and prints the list as formatted JSON. The same two functions are all you need: swap the query in main() and the parser handles whatever comes back.

What the output looks like

You get a clean ordered list of result objects, each with its position, title, URL, and description, ready to write to JSON, CSV, or a database. Because the example query mixes English and Russian-language storefronts, you can see Yandex's regional strength in the results themselves.

json

[
  {
    "position": 1,
    "title": "Best Winter Jackets of 2024 | Switchback Travel",
    "url": "https://www.switchbacktravel.com/best-winter-jackets",
    "description": "Patagonia Tres 3-in-1 parka. Category: Casual. Fill: 4.2 oz. of 700-fill-power down."
  },
  {
    "position": 2,
    "title": "Winter jacket: купить по низкой цене на Яндекс Маркете",
    "url": "https://market.yandex.ru/search?text=winter%20jacket",
    "description": "Купить winter jacket: 97 предложений, низкие цены, быстрая доставка."
  },
  {
    "position": 3,
    "title": "Amazon.com: Winter Jackets",
    "url": "https://www.amazon.com/Winter-Jackets/s?k=Winter+Jackets",
    "description": "CAMEL CROWN Men's Mountain Snow Waterproof Ski Jacket, Detachable Hood, Fleece Parka."
  }
]

One thing to watch in real output: the Cyrillic title in position 2 keeps its original characters thanks to the UTF-8 decode and ensure_ascii=False. If you see \u escape sequences instead, one of those two steps is missing.

Scaling across pages and queries

One query on one page is a demo; a real job runs over several searches and deeper into the results. Yandex paginates with the p query parameter, which is a zero-based page index: p=0 is the first page, p=1 the second, and so on. The shape stays the same: build each URL with the next page number, fetch it through the Crawling API, and parse it with the same function. A small change keeps positions continuous across pages instead of restarting at 1 on every page, and a short sleep between requests paces the crawl.

python

import time
from urllib.parse import quote

def scrape_all_pages(query, max_pages=5):
    base_url = f"https://yandex.com/search/?text={quote(query)}&p="
    all_results = []
    position = 1

    for page in range(max_pages):
        html_content = fetch_page_html(base_url + str(page))
        if not html_content:
            break
        page_results = scrape_yandex_search(html_content)
        for result in page_results:
            result["position"] = position
            position += 1
        all_results.extend(page_results)
        time.sleep(2)  # pace requests to respect the server

    return all_results

The loop walks five pages by default, fetches each through the Crawling API, and reassigns a running position so ranks stay continuous from page one to page five. The two-second time.sleep between requests keeps you from hammering Yandex in a tight loop. Raise max_pages only as far as you genuinely need; deeper results are usually less relevant anyway. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same IP rotation as a drop-in proxy endpoint.

Exporting to CSV

JSON is handy in a terminal, but a CSV opens straight in a spreadsheet, which is what most analysts actually want. pandas turns the list of result dictionaries into a CSV in two lines.

python

import pandas as pd

def save_to_csv(results, filename="yandex_search_results.csv"):
    df = pd.DataFrame(results)
    df.to_csv(filename, index=False, encoding="utf-8-sig")
    print(f"Saved {len(results)} results to {filename}")

if __name__ == "__main__":
    results = scrape_all_pages("Winter Jackets", max_pages=3)
    save_to_csv(results)

Building the DataFrame from the list of dicts gives you one column per field (position, title, url, description) and one row per result. Passing index=False drops pandas' own row index from the file, and encoding="utf-8-sig" writes a byte-order mark so Cyrillic descriptions open correctly in Excel rather than turning into mojibake. From here you can extend the same pattern to write into a database such as SQLite if you prefer a query-friendly store.

Staying unblocked

Even with a trusted IP handled for you, Yandex watches for scraper-shaped traffic, and its anti-bot checks are stricter than many Western targets. A few habits keep a run healthy.

Pace your requests. Hammering results pages in a tight loop is the fastest way to land the verification interstitial. Spread requests out and vary your queries instead of paging one term at full speed.
Lean on rotation. A pool of IP addresses spreads requests so no single one trips Yandex's per-address rate checks. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
Read the status codes. A run that starts returning challenges or non-200 statuses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
Re-inspect when fields go empty. Yandex changes its markup periodically. If results stop parsing, open a live page in dev tools and update the selectors.

For the broader playbook, see how to scrape websites without getting blocked and the deeper dive on how to bypass captchas while web scraping. The same approach carries to other engines: our guides on scraping Bing search results and scraping Google search pages use the same fetch-then-parse structure with different selectors.

Is it legal to scrape Yandex?

Whether scraping Yandex is allowed depends on Yandex's terms of service, your jurisdiction, and what you do with the data. Like most search engines, Yandex places limits on automated access in its terms, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Yandex's terms and its robots.txt, and treat both as the boundary for what you collect.

A few lines worth holding to. Collect only public search-results data: the titles, links, descriptions, and positions that anyone can see on a results page without an account. Keep your request volume low enough that you are not straining Yandex's servers, and pace your crawl rather than running it flat out. Yandex does publish official products for some structured data, such as Yandex Maps and Yandex Market APIs, so where a sanctioned endpoint exists for what you need, that is the better path than scraping the SERP.

This guide is deliberately scoped to public search-results pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or personal data, or copyrighted media pulled from the linked destinations. Public SERP data only. If your project needs more than that, an official data agreement is the correct path, not a cleverer scraper.

Recap

Key takeaways

Yandex is strong in its region. It leads Russian-language and regional search, so its public results are a useful signal where Google does not dominate.
The Crawling API fetches behind a trusted IP. Send it the URL, it rotates addresses server-side and renders when needed, and returns finished HTML, sidestepping the "Are you a robot?" interstitial.
BeautifulSoup does the extraction. Select each .serp-item, then read title, URL, description, and position from it, and expect the class names to drift over time.
Paginate with the p index. Increment p by one per page to walk deeper into results, keep positions continuous, and pace requests with a sleep between pages.
Stay on public data. Respect Yandex's ToS and robots.txt, keep volume low, prefer official APIs like Yandex Market where they fit, and never touch accounts or personal data.

Frequently Asked Questions (FAQs)

What is Yandex and why scrape it?

Yandex is the leading search engine in Russia, often called the "Google of Russia," and it also offers maps, mail, market, and cloud services. People scrape its public results to track Russian-language and regional rankings, study search trends, monitor how brands surface in that market, and gather data for research that Google-centric tools miss.

Can I scrape Yandex search results with Python?

Yes. With the crawlbase client and BeautifulSoup you can fetch a results page and pull out titles, URLs, descriptions, and positions. The Crawling API acts as the bridge that gets your request to Yandex from a trusted IP, so requests are processed smoothly instead of hitting the verification interstitial. For a broader Python primer, see our guide on scraping websites with Python.

Why does a plain request get blocked on Yandex?

Yandex flags traffic that does not look like a real browser, and it watches request rate per IP closely. A few fast requests from one address trigger its "Are you a robot?" verification page or an outright block. Fetching through the Crawling API, which rotates IPs and renders when needed, makes each request look like an ordinary visitor so you get the real results page.

How do I paginate through more Yandex results?

Use the p query parameter, which is a zero-based page index: p=0 is the first page, p=1 the second, and so on. Build each page URL with the next index, fetch it through the Crawling API, parse it with the same function, reassign a running position so ranks stay continuous, and pause a couple of seconds between requests so you are pacing the crawl rather than hammering it.

How do I handle Russian-language results and Cyrillic text?

Decode the response body as UTF-8 when you read it, and when you export, write JSON with ensure_ascii=False or CSV with encoding="utf-8-sig". Those two steps keep Cyrillic titles and descriptions readable instead of turning into \u escapes or mojibake. Yandex's regional strength means many results come back in Russian, so this matters more here than on a Google scrape.

My selectors return nothing. What changed?

Almost certainly Yandex's markup. Class names like organic__url-text and organic__content-wrapper can change when Yandex redeploys its front end, so selectors that worked last month can break. Re-inspect a live results page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.

Hassan Rehan

Software Engineer · Crawlbase

Software engineer at Crawlbase writing hands-on guides on rotating proxies, scraping, and the practical details of wiring proxies into real code.

Start Building

Crawl any site at scale, without fighting infrastructure.

Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. Up to 20,000 requests free, no card required.

Get a free API key →Read the docs

Self-serve · No sales call required · Enterprise crawl volumes available

What you will build

Why a plain request fails on Yandex

Prerequisites

Set up the project

Step 1: Fetch the page through the Crawling API

Step 2: Parse the results with BeautifulSoup

Step 3: Put it together

What the output looks like

Scaling across pages and queries

Exporting to CSV

Staying unblocked

Is it legal to scrape Yandex?

Key takeaways

Frequently Asked Questions (FAQs)

What is Yandex and why scrape it?

Can I scrape Yandex search results with Python?

Why does a plain request get blocked on Yandex?

How do I paginate through more Yandex results?

How do I handle Russian-language results and Cyrillic text?

My selectors return nothing. What changed?

Crawl any site at scale, without fighting infrastructure.

Continue Reading

How to Scrape Google People Also Ask: full PAA extraction guide

Introducing the New Crawlbase Dashboard: a cleaner control center

13 Tips to Master Data Crawling: crawls that do not break

The infrastructure brief, in your inbox.