Yandex is the search engine most people in Russia reach for first, and it holds a similar pull across several neighboring countries. By most estimates it carries more than half of the Russian search market, which makes its public results a useful signal for anyone tracking Russian-language demand, regional rankings, or how a brand surfaces in a market Google does not lead. The results page exposes the same structured data a SERP tool wants anywhere: titles, links, snippets, and the order they appear in.
This guide shows you how to scrape Yandex search results with Python the reliable way. You build a small, runnable scraper that fetches a rendered results page through the Crawling API, parses each organic result with BeautifulSoup, and exports the data to JSON and CSV. The whole walkthrough stays scoped to public search-results data that anyone can see without an account, and the legality section near the end is not boilerplate, so read it before you point this at any real volume.
What you will build
A Python script that takes a public Yandex search URL, retrieves the HTML through the Crawling API, and extracts a structured record for every organic result on the page. We use the query "Winter Jackets" as the running example and pull these fields from each result:
- Title the headline text of the result, as shown in the listing.
- URL the destination link the result points to.
- Description the snippet shown under the title.
- Position the rank of the result on the page, counted from the top.
Why a plain request fails on Yandex
If you fire a bare HTTP request at a Yandex results URL from a script, you rarely get the clean page you see in your own browser. Yandex watches closely for automated traffic. Requests that do not look like a real browser, or that arrive too quickly from one address, get challenged with a verification page (the "Are you a robot?" interstitial) or blocked outright before they reach the listings. Repeating the same IP across many queries trips those checks fast.
So a working Yandex scraper needs two things in one request: an IP the platform reads as a real visitor, and, when the page leans on scripts, a browser that renders it. You can assemble that yourself with a headless browser plus a pool of rotating residential proxies, but keeping those healthy is most of the work. The Crawling API folds both into a single call: you send it the URL, it fetches from a trusted IP and renders when needed, and it returns finished HTML for you to parse.
Yandex's anti-bot checks lean heavily on request rate per address. A handful of fast requests from one IP is enough to trigger its verification interstitial. The Crawling API rotates through many addresses server-side, so requests spread across them and no single one trips a limit. You can start with 1,000 free requests, no credit card needed.
Prerequisites
You need a few things in place before writing any code. None of them take long.
Basic Python. You should be comfortable writing and running a Python script and installing packages with pip. If BeautifulSoup is new to you, our guide to using BeautifulSoup in Python covers the parsing basics this tutorial assumes.
Python 3.8 or later. Confirm your version with python --version. If you do not have it, install it from python.org or through a distribution like Anaconda.
A Crawlbase account and token. Sign up, open your dashboard, and copy your request token from the account docs page. Crawlbase issues two token types: a Normal token for static pages and a JavaScript token for script-heavy ones. Yandex's organic results come back in the initial HTML, so the Normal token is the right choice here. Your first 1,000 requests are free. Treat the token like a password: it authenticates your requests, so keep it out of version control.
Set up the project
Create a virtual environment so project dependencies stay isolated, then install the three libraries the scraper needs.
python --version python -m venv yandex_env source yandex_env/bin/activate pip install crawlbase beautifulsoup4 pandas
On Windows, activate the environment with yandex_env\Scripts\activate instead of the source line. Three dependencies do the work: crawlbase is the official client that sends your request to the Crawling API, beautifulsoup4 parses the returned HTML so you can pull out fields by CSS selector, and pandas handles the export to CSV at the end.
Step 1: Fetch the page through the Crawling API
Start by getting the HTML. A Yandex search URL is the main domain plus the query in the text parameter, so https://yandex.com/search/?text=Winter%20Jackets searches for "Winter Jackets". Encode the query with urllib.parse.quote so spaces and special characters survive the trip. Write a small fetch_page_html() function that hands the URL to the Crawling API with your token, checks that Yandex itself returned a 200 status, and gives back the decoded HTML body. Checking the status before you parse keeps failures loud instead of silent.
from crawlbase import CrawlingAPI from urllib.parse import quote API_TOKEN = "YOUR_CRAWLBASE_TOKEN" # replace with your Normal token crawling_api = CrawlingAPI({"token": API_TOKEN}) def fetch_page_html(url): response = crawling_api.get(url) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed with Crawlbase status {response['headers']['pc_status']}") return None if __name__ == "__main__": url = f"https://yandex.com/search/?text={quote('Winter Jackets')}" html = fetch_page_html(url) if html: print(html[:500])
The crawling_api.get(url) call returns a response whose headers["pc_status"] is the status Yandex itself returned and whose body is the raw page bytes. Guarding on pc_status == "200" means a block or a verification page surfaces as a clean failure instead of feeding garbage into the parser. Decoding the body as UTF-8 is what keeps Cyrillic titles and descriptions readable. Save the file as yandex_scraper.py, run it with python yandex_scraper.py, and you should see real results markup in the first 500 characters, which confirms the fetch works before you write a single selector.
That pc_status check only ever reads 200 because the request reached Yandex looking like a real visitor in the first place, sidestepping the "Are you a robot?" interstitial. The Crawling API fetches the page from a rotating IP, renders it when the page needs a browser, and hands you finished HTML, so you skip running a headless fleet and sourcing a residential proxy pool yourself. Point it at a public results URL on the free tier first.
Step 2: Parse the results with BeautifulSoup
With HTML in hand, load it into BeautifulSoup and pull each result by its selector. Yandex wraps each organic result in a .serp-item container; inside it, the headline sits in h2.organic__url-text, the destination link in a.organic__url, and the snippet in div.organic__content-wrapper. To confirm these on a live page, open the Yandex results URL in your browser, right-click a result, choose Inspect, and read the class names off the element; the selectors below match the layout at the time of writing.
from bs4 import BeautifulSoup def scrape_yandex_search(html_content): soup = BeautifulSoup(html_content, "html.parser") search_results = [] for position, result in enumerate(soup.select(".serp-item"), start=1): title_element = result.select_one("h2.organic__url-text") url_element = result.select_one("a.organic__url") description_element = result.select_one("div.organic__content-wrapper") if not title_element or not url_element: continue search_results.append({ "position": position, "title": title_element.get_text(strip=True), "url": url_element["href"], "description": description_element.get_text(strip=True) if description_element else None, }) return search_results
Selecting .serp-item gives you one element per result, and enumerate(..., start=1) hands you the position for free as you loop, so rank comes from page order instead of a fragile attribute. Reading the URL from a.organic__url's href keeps the destination separate from the title text. The if not title_element or not url_element: continue guard skips anything that is not a real organic result, which keeps ad blocks and stray markup out of your output. The description falls back to None when its container is absent.
Yandex revises its front-end markup periodically, and class names like organic__url-text can change with a redeploy. Treat the selectors above as a starting template, not a contract. When a field comes back empty for every result, re-inspect a live page in your browser's dev tools and update the selector. Periodic selector maintenance is normal for any production scraper, not a sign something is broken.
Step 3: Put it together
Now wire the fetch and the parse into one runnable script. Crawl the rendered results page, hand the HTML to the parser, and print the structured output as JSON. Setting ensure_ascii=False keeps Cyrillic characters readable in the output instead of escaping them into \u sequences.
from crawlbase import CrawlingAPI from bs4 import BeautifulSoup from urllib.parse import quote import json API_TOKEN = "YOUR_CRAWLBASE_TOKEN" crawling_api = CrawlingAPI({"token": API_TOKEN}) def fetch_page_html(url): response = crawling_api.get(url) if response["headers"]["pc_status"] == "200": return response["body"].decode("utf-8") print(f"Request failed with Crawlbase status {response['headers']['pc_status']}") return None def scrape_yandex_search(html_content): soup = BeautifulSoup(html_content, "html.parser") search_results = [] for position, result in enumerate(soup.select(".serp-item"), start=1): title_element = result.select_one("h2.organic__url-text") url_element = result.select_one("a.organic__url") description_element = result.select_one("div.organic__content-wrapper") if not title_element or not url_element: continue search_results.append({ "position": position, "title": title_element.get_text(strip=True), "url": url_element["href"], "description": description_element.get_text(strip=True) if description_element else None, }) return search_results def main(): search_query = "Winter Jackets" url = f"https://yandex.com/search/?text={quote(search_query)}" html_content = fetch_page_html(url) if html_content: search_results = scrape_yandex_search(html_content) print(json.dumps(search_results, ensure_ascii=False, indent=2)) if __name__ == "__main__": main()
Run the full script with python yandex_scraper.py. It fetches the results page for "Winter Jackets", extracts a record for each organic listing, and prints the list as formatted JSON. The same two functions are all you need: swap the query in main() and the parser handles whatever comes back.
What the output looks like
You get a clean ordered list of result objects, each with its position, title, URL, and description, ready to write to JSON, CSV, or a database. Because the example query mixes English and Russian-language storefronts, you can see Yandex's regional strength in the results themselves.
[ { "position": 1, "title": "Best Winter Jackets of 2024 | Switchback Travel", "url": "https://www.switchbacktravel.com/best-winter-jackets", "description": "Patagonia Tres 3-in-1 parka. Category: Casual. Fill: 4.2 oz. of 700-fill-power down." }, { "position": 2, "title": "Winter jacket: купить по низкой цене на Яндекс Маркете", "url": "https://market.yandex.ru/search?text=winter%20jacket", "description": "Купить winter jacket: 97 предложений, низкие цены, быстрая доставка." }, { "position": 3, "title": "Amazon.com: Winter Jackets", "url": "https://www.amazon.com/Winter-Jackets/s?k=Winter+Jackets", "description": "CAMEL CROWN Men's Mountain Snow Waterproof Ski Jacket, Detachable Hood, Fleece Parka." } ]
One thing to watch in real output: the Cyrillic title in position 2 keeps its original characters thanks to the UTF-8 decode and ensure_ascii=False. If you see \u escape sequences instead, one of those two steps is missing.
Scaling across pages and queries
One query on one page is a demo; a real job runs over several searches and deeper into the results. Yandex paginates with the p query parameter, which is a zero-based page index: p=0 is the first page, p=1 the second, and so on. The shape stays the same: build each URL with the next page number, fetch it through the Crawling API, and parse it with the same function. A small change keeps positions continuous across pages instead of restarting at 1 on every page, and a short sleep between requests paces the crawl.
import time from urllib.parse import quote def scrape_all_pages(query, max_pages=5): base_url = f"https://yandex.com/search/?text={quote(query)}&p=" all_results = [] position = 1 for page in range(max_pages): html_content = fetch_page_html(base_url + str(page)) if not html_content: break page_results = scrape_yandex_search(html_content) for result in page_results: result["position"] = position position += 1 all_results.extend(page_results) time.sleep(2) # pace requests to respect the server return all_results
The loop walks five pages by default, fetches each through the Crawling API, and reassigns a running position so ranks stay continuous from page one to page five. The two-second time.sleep between requests keeps you from hammering Yandex in a tight loop. Raise max_pages only as far as you genuinely need; deeper results are usually less relevant anyway. If you would rather route your own traffic through a rotating pool instead of using the managed API, the Smart AI Proxy gives you the same IP rotation as a drop-in proxy endpoint.
Exporting to CSV
JSON is handy in a terminal, but a CSV opens straight in a spreadsheet, which is what most analysts actually want. pandas turns the list of result dictionaries into a CSV in two lines.
import pandas as pd def save_to_csv(results, filename="yandex_search_results.csv"): df = pd.DataFrame(results) df.to_csv(filename, index=False, encoding="utf-8-sig") print(f"Saved {len(results)} results to {filename}") if __name__ == "__main__": results = scrape_all_pages("Winter Jackets", max_pages=3) save_to_csv(results)
Building the DataFrame from the list of dicts gives you one column per field (position, title, url, description) and one row per result. Passing index=False drops pandas' own row index from the file, and encoding="utf-8-sig" writes a byte-order mark so Cyrillic descriptions open correctly in Excel rather than turning into mojibake. From here you can extend the same pattern to write into a database such as SQLite if you prefer a query-friendly store.
Staying unblocked
Even with a trusted IP handled for you, Yandex watches for scraper-shaped traffic, and its anti-bot checks are stricter than many Western targets. A few habits keep a run healthy.
- Pace your requests. Hammering results pages in a tight loop is the fastest way to land the verification interstitial. Spread requests out and vary your queries instead of paging one term at full speed.
- Lean on rotation. A pool of IP addresses spreads requests so no single one trips Yandex's per-address rate checks. The Crawling API handles this for you; if you roll your own stack, this is the part to get right.
- Read the status codes. A run that starts returning challenges or non-200 statuses is telling you the current rate or IP tier is no longer enough. Treat that as signal to back off, not noise to ignore.
- Re-inspect when fields go empty. Yandex changes its markup periodically. If results stop parsing, open a live page in dev tools and update the selectors.
For the broader playbook, see how to scrape websites without getting blocked and the deeper dive on how to bypass captchas while web scraping. The same approach carries to other engines: our guides on scraping Bing search results and scraping Google search pages use the same fetch-then-parse structure with different selectors.
Is it legal to scrape Yandex?
Whether scraping Yandex is allowed depends on Yandex's terms of service, your jurisdiction, and what you do with the data. Like most search engines, Yandex places limits on automated access in its terms, so scraping can run against those terms regardless of how careful your tooling is. None of the code here changes that; it just makes the technical part work. Read Yandex's terms and its robots.txt, and treat both as the boundary for what you collect.
A few lines worth holding to. Collect only public search-results data: the titles, links, descriptions, and positions that anyone can see on a results page without an account. Keep your request volume low enough that you are not straining Yandex's servers, and pace your crawl rather than running it flat out. Yandex does publish official products for some structured data, such as Yandex Maps and Yandex Market APIs, so where a sanctioned endpoint exists for what you need, that is the better path than scraping the SERP.
This guide is deliberately scoped to public search-results pages because that is the line that keeps the work defensible. It does not cover anything behind a login, account or personal data, or copyrighted media pulled from the linked destinations. Public SERP data only. If your project needs more than that, an official data agreement is the correct path, not a cleverer scraper.
Key takeaways
- Yandex is strong in its region. It leads Russian-language and regional search, so its public results are a useful signal where Google does not dominate.
- The Crawling API fetches behind a trusted IP. Send it the URL, it rotates addresses server-side and renders when needed, and returns finished HTML, sidestepping the "Are you a robot?" interstitial.
-
BeautifulSoup does the extraction. Select each
.serp-item, then read title, URL, description, and position from it, and expect the class names to drift over time. -
Paginate with the p index. Increment
pby one per page to walk deeper into results, keep positions continuous, and pace requests with a sleep between pages. - Stay on public data. Respect Yandex's ToS and robots.txt, keep volume low, prefer official APIs like Yandex Market where they fit, and never touch accounts or personal data.
Frequently Asked Questions (FAQs)
What is Yandex and why scrape it?
Yandex is the leading search engine in Russia, often called the "Google of Russia," and it also offers maps, mail, market, and cloud services. People scrape its public results to track Russian-language and regional rankings, study search trends, monitor how brands surface in that market, and gather data for research that Google-centric tools miss.
Can I scrape Yandex search results with Python?
Yes. With the crawlbase client and BeautifulSoup you can fetch a results page and pull out titles, URLs, descriptions, and positions. The Crawling API acts as the bridge that gets your request to Yandex from a trusted IP, so requests are processed smoothly instead of hitting the verification interstitial. For a broader Python primer, see our guide on scraping websites with Python.
Why does a plain request get blocked on Yandex?
Yandex flags traffic that does not look like a real browser, and it watches request rate per IP closely. A few fast requests from one address trigger its "Are you a robot?" verification page or an outright block. Fetching through the Crawling API, which rotates IPs and renders when needed, makes each request look like an ordinary visitor so you get the real results page.
How do I paginate through more Yandex results?
Use the p query parameter, which is a zero-based page index: p=0 is the first page, p=1 the second, and so on. Build each page URL with the next index, fetch it through the Crawling API, parse it with the same function, reassign a running position so ranks stay continuous, and pause a couple of seconds between requests so you are pacing the crawl rather than hammering it.
How do I handle Russian-language results and Cyrillic text?
Decode the response body as UTF-8 when you read it, and when you export, write JSON with ensure_ascii=False or CSV with encoding="utf-8-sig". Those two steps keep Cyrillic titles and descriptions readable instead of turning into \u escapes or mojibake. Yandex's regional strength means many results come back in Russian, so this matters more here than on a Google scrape.
My selectors return nothing. What changed?
Almost certainly Yandex's markup. Class names like organic__url-text and organic__content-wrapper can change when Yandex redeploys its front end, so selectors that worked last month can break. Re-inspect a live results page in your browser's dev tools and update the selectors. Periodic selector maintenance is normal for any production scraper.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
