You can clear every cookie, switch to a fresh IP, and open an incognito window, and a site can still recognize you on the next request. That is browser fingerprinting at work. Instead of storing an ID on your machine, the site derives one from the way your machine answers a hundred small questions: what browser you run, how your GPU draws a curve, how your audio stack rounds a number, how your TLS handshake is shaped. Combine enough of those answers and you get a value that is stable across sessions and close to unique.
For engineers who scrape, this is the defense that does not care about your proxy. Rotating IPs solves one problem and leaves fingerprinting untouched, which is why a "clean" residential IP can still hand you a block page. This article explains what a fingerprint actually is, the signals that compose it, why the combination identifies you, and the part that matters most for scraping: how to keep your fingerprint coherent so the IP and the browser agree with each other.
What a browser fingerprint is
A browser fingerprint is a derived identifier built from the configuration and behavior your browser exposes to a page. A script reads dozens of properties, hashes the combination, and gets a value that tends to stay the same every time you visit. Nothing is written to your device. The site does not need your permission and does not need to store anything on your end, because the ID is recomputed from your environment on each visit.
The mental model that matters: a fingerprint is stateless and derived, where a cookie is stored. A cookie is a token the site planted on your machine, so deleting it removes the link. A fingerprint is calculated fresh from your hardware and software each time, so there is nothing on your side to delete. That single difference is why fingerprinting survives the three things most people reach for to stay anonymous:
- Clearing cookies. There is no stored token to clear; the ID is recomputed.
- Switching IPs. The IP is just one signal among many, and most signals come from the browser, not the network.
- Incognito or private mode. Private windows block history and cookies, but they still expose the same screen, fonts, GPU, and TLS stack, so the fingerprint barely changes.
Cookies are something a site gives you and you can throw away. A fingerprint is something a site measures about you, recomputed on every visit. You cannot delete a measurement, you can only change what is being measured, which is far harder than clearing a cache.
The signals that compose a fingerprint
A fingerprint is not one value, it is a stack of signals collected at different layers. Some arrive in the request itself, some are read by JavaScript after the page loads, and one is set before any of your code runs at all. Knowing which layer each lives in is what lets you reason about consistency later.
Request-level signals
The cheapest signals come straight off the HTTP request, before a single line of JavaScript executes:
- User-Agent and headers. The browser and version you claim, plus the exact set and order of headers a request carries. Real browsers send a consistent, predictable header set; a bare HTTP client usually sends fewer headers, in a different order, which is an easy tell.
- Accept-Language and timezone. The languages your browser advertises and, via JavaScript, the timezone your system reports. A US datacenter IP paired with a Moscow timezone and a Vietnamese language header is an obvious mismatch.
- Screen and color depth. Resolution, available screen area, device pixel ratio, and color depth. Common values are shared by millions; unusual ones narrow you down fast.
JavaScript-rendered signals
These require a real rendering engine to produce. A plain HTTP client cannot generate them at all, and their absence is itself a signal.
- Fonts. The set of fonts installed on your system, probed by measuring how test strings render. The combination is surprisingly distinctive.
- Canvas fingerprinting. The page draws text and shapes to an invisible HTML5 canvas, then reads the pixels back. Your specific mix of GPU, drivers, anti-aliasing, and font rasterization produces tiny per-device differences, and hashing the pixel data yields a stable signature.
- Audio fingerprinting. The Web Audio API generates a tone through an oscillator and compressor, then reads the resulting buffer. The exact floating-point output depends on your audio stack and hardware, so the derived value is consistent on your machine and different across devices.
- WebGL and GPU. The WebGL API reports your renderer and vendor strings and how it draws 3D scenes, exposing the GPU and driver behind the browser.
The canvas read is small enough to show. The page never displays this canvas; it draws to it off-screen and serializes the result:
const canvas = document.createElement("canvas") const ctx = canvas.getContext("2d") ctx.textBaseline = "top" ctx.font = "14px Arial" ctx.fillText("Crawlbase fingerprint \u{1F4A1}", 2, 2) // Same code, different pixels per GPU/driver/font stack. const signature = canvas.toDataURL() const fp = hash(signature)
The signal most clients get wrong: TLS / JA3
The handshake happens before HTTP, before JavaScript, before anything you control in code. When your client opens a TLS connection, it sends a Client Hello that lists the cipher suites, extensions, and elliptic curves it supports, in a specific order. That shape is consistent for a given client library and version, and hashing it produces a JA3 fingerprint. Chrome's handshake looks like Chrome; Python's requests looks like Python.
This is the layer that trips most scrapers and that no User-Agent string can fix. You can set every header to claim you are Safari on an iPhone, but if your TLS handshake matches a Python HTTP library on Linux, the two layers disagree, and a defender comparing them sees the lie immediately. The handshake is set by your networking stack, not your headers, so spoofing it means changing the client itself.
Why the combination identifies you
No single signal is unique. Millions of people run the same browser version at 1920x1080. The power is in the combination: stack your browser, OS, fonts, timezone, canvas hash, audio hash, WebGL renderer, and TLS shape together and the joint value becomes rare enough to single you out. Studies of real traffic put device identification somewhere in the 90-99% range, and you should treat those as observed figures from specific datasets, not a guarantee that every browser is uniquely identifiable.
The same property that makes fingerprinting useful for fraud prevention and bot detection makes it a privacy concern: it tracks across sessions without consent and without anything stored on your device. For scrapers, the takeaway is narrower and sharper. Your tooling emits a combination of signals, and if that combination looks like automation, or looks internally contradictory, you are flagged regardless of how good your IP is.
How fingerprinting blocks scrapers
Anti-bot systems collect the same stack of signals from your scraper that they collect from a real visitor, then ask two questions. First, does this look like a real browser at all? Second, do the layers agree with each other? Most scrapers fail the second question even when they pass the first, and that is the insight that changes how you build.
Here is the trap. You buy rotating residential IPs, you set a convincing User-Agent, and you still get blocked. The IP rotation did its job, but fingerprinting never looked at the IP in isolation. It looked at the whole picture, and the picture was inconsistent:
- A constant fingerprint behind rotating IPs. If every request shares one canvas hash and one TLS signature while the IP changes each time, you have one "device" teleporting around the world. That pattern is itself a flag.
- Internally inconsistent layers. A Linux datacenter TLS signature claiming, via its User-Agent, to be Safari on an iPhone. iOS Safari does not produce that handshake, and it does not run on that hardware. The contradiction is the tell.
- Missing signals a real browser always has. No canvas, no WebGL, no audio context, a thin header set. A real browser produces all of these; a bare HTTP client produces none, and the gap is glaring.
So the rule is not "rotate harder." The rule is consistency across layers. The IP origin, the headers, the TLS handshake, and the JavaScript-rendered signals all have to describe the same plausible person on the same plausible device. This is the same coherence problem behind bypassing Cloudflare and avoiding bot detection and a big part of why scrapers run into CAPTCHAs while scraping: the challenge is often triggered by an incoherent fingerprint, not by the IP alone.
What to actually do about it
You have three realistic paths, and they trade off effort against control.
- Run a real or headless browser that renders. A genuine browser engine produces real canvas, WebGL, audio, and font signals, and its TLS handshake matches its User-Agent because it is the same software. This closes the "missing signals" gap and the "TLS does not match the browser" gap in one move. The cost is speed and resources: rendering is far heavier than a raw fetch.
- Use an anti-detect browser. These tools give each session a deliberately crafted, internally consistent profile, varying signals together so the layers stay plausible. Useful for smaller, session-heavy work; managing many coherent profiles at scale is its own job.
- Use a managed solution. A service that keeps the entire fingerprint coherent for you, presenting a believable browser and pairing it with a matching IP, so you point at one endpoint instead of maintaining the stack yourself.
Be honest with yourself about the second-most-tempting option: hand-rolling a consistent fingerprint with raw HTTP requests. It is doable in principle, but it is hard and fragile. You have to match the TLS handshake to the claimed browser, send the exact header set and order a real browser sends, and keep all of it in sync as browser versions move and detection evolves. One stale value and the layers disagree again. For most teams the maintenance burden outweighs the savings, which is why rendering or a managed layer usually wins. If you want the broader playbook, see how to scrape websites without getting blocked.
Whatever path you pick, the IP still has to be coherent with the rest. A residential origin reads as a real person where a datacenter range does not, which is the core of the datacenter vs residential proxies tradeoff, and rotation has to spread load without making one fingerprint hop between continents. Rotating residential proxies handle the IP side, but only a coherent fingerprint on top makes the whole request believable.
The hard part is keeping the IP and the browser fingerprint in agreement at scale. Smart Proxy is one backconnect endpoint that rotates real-user residential IPs and presents a coherent browser fingerprint together, so the handshake, the headers, and the rendered signals describe the same plausible visitor instead of contradicting each other. Point your client at a single host and try it on the free tier.
Key takeaways
- A fingerprint is derived, not stored. It is recomputed from your environment each visit, so it survives clearing cookies, switching IPs, and incognito.
- It is a stack of signals. Headers, screen, fonts, canvas, audio, WebGL, and the TLS/JA3 handshake combine into one near-unique value, identified in the 90-99% range in real datasets.
- TLS is the signal scrapers get wrong. The handshake is set by your networking stack, not your User-Agent, so a Python client claiming to be Safari is exposed at the TLS layer.
- Rotating IPs alone does nothing. A constant or internally inconsistent fingerprint gets flagged no matter how clean the IP is.
- Consistency across layers wins. The IP, headers, TLS, and rendered signals must describe the same plausible device; rendering or a managed layer is the practical way to keep them coherent.
Frequently Asked Questions (FAQs)
What is browser fingerprinting?
Browser fingerprinting is a technique that builds a unique identifier for your device from the configuration and behavior your browser exposes, such as your User-Agent, screen, installed fonts, GPU, audio stack, and TLS handshake. The combination of these signals is hashed into a value that tends to stay the same across visits. Nothing is stored on your device, because the ID is recomputed from your environment each time.
How is a fingerprint different from a cookie?
A cookie is a token a site stores on your machine, so deleting it removes the link. A fingerprint is derived, measured fresh from your hardware and software on every visit, so there is nothing on your side to delete. That is why fingerprinting survives clearing cookies, switching IPs, and using private browsing mode.
Can I avoid fingerprinting by clearing cookies or using incognito?
No. Those actions remove stored data, but a fingerprint is not stored; it is recomputed from signals like your screen, fonts, GPU, and TLS stack, which incognito mode does not change. Private windows hide history and cookies, but they expose the same device characteristics, so the fingerprint barely shifts.
Why does my scraper get blocked even with rotating proxies?
Because fingerprinting does not look at the IP in isolation. If every request carries the same canvas hash and TLS signature while the IP rotates, you look like one device teleporting worldwide. If your TLS handshake says Python on Linux while your User-Agent claims Safari on iPhone, the layers contradict each other. Either pattern gets flagged regardless of how clean the IP is.
What is TLS or JA3 fingerprinting?
When your client opens an HTTPS connection, the TLS Client Hello lists supported cipher suites, extensions, and curves in a specific order. Hashing that shape produces a JA3 fingerprint that is characteristic of the client library and version. It is set by your networking stack rather than your headers, so it often exposes a scraper whose User-Agent claims to be a browser it is not.
What is the best way to scrape past fingerprinting?
Keep your fingerprint coherent across every layer. Run a real or headless browser so the rendered signals and TLS handshake actually match the browser you claim to be, or use a managed layer that pairs a believable fingerprint with a matching real-user IP. Hand-rolling a consistent fingerprint with raw HTTP requests is possible but hard to maintain as browsers and detection evolve.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
