Every request your machine makes to the internet carries your IP address. A proxy server sits in the middle and makes the request for you, so the destination sees the proxy's address instead of yours. That one layer of indirection is the whole idea. Everything else, rotation, anonymity tiers, the dozen product names, is a variation on it.
This post covers what actually happens when you route traffic through a proxy, the handful of types worth understanding, and how to pick one for a real workload. We will organize the sixteen "types" you see elsewhere by the decisions they actually represent, because most of them describe the same two or three choices under different names.
What a proxy server is
A proxy server is a machine that makes network requests on behalf of another machine. Your client talks to the proxy, the proxy talks to the destination, and the response comes back the same way. To the destination, the traffic looks like it came from the proxy.
That indirection buys you a few things at once: the origin sees a different IP, you can place that IP in a different network or country, and the hop in the middle can inspect, cache, filter, or rewrite traffic as it passes. Which of those you care about determines which kind of proxy you reach for.
How a proxy actually works
When you use an HTTP proxy, your client opens a connection to the proxy instead of the target site and puts the full target URL in the request line. The proxy reads that, opens its own connection to the origin, forwards your request, and streams the response back. Because it can see the plaintext request, it can add, strip, or rewrite headers on the way through. That is how a proxy attaches a clean IP, sets a geolocation, or enforces a header policy.
HTTPS is different. Your client cannot let the proxy read an encrypted stream, so it sends a CONNECT request asking the proxy to open a raw tunnel to the origin, then performs its TLS handshake straight through that tunnel. The proxy moves bytes back and forth without seeing inside them.
# Client asks the proxy to tunnel to the origin CONNECT example.com:443 HTTP/1.1 Host: example.com:443 # Proxy confirms, then relays the encrypted bytes blindly HTTP/1.1 200 Connection Established
This is why an HTTP proxy can filter or modify content but an HTTPS tunnel mostly just relays it. "Transparent" inspection of HTTPS requires installing a certificate on the client so the proxy can terminate TLS itself, which is something you should be skeptical of on any network you do not control.
The proxy taxonomy: four decisions, not sixteen types
Most "types of proxy" lists mix four unrelated questions into one flat menu. Separate them and the picture gets simple. A given proxy is a point in four dimensions: where its IP comes from, which direction it faces, which protocol it speaks, and how much it reveals about you.
By origin: datacenter, residential, mobile
This is the dimension that matters most for scraping and the one people actually pay for.
Datacenter proxies are IPs hosted in cloud or hosting providers. They are fast and cheap, and they come in large contiguous blocks, which also makes them the easiest for a site to recognize and block. Use them where the target does not scrutinize IP reputation.
Residential proxies route through real consumer ISP connections, so their IPs look like ordinary home users. They are slower and more expensive, but far harder to block. See datacenter vs residential proxies for the full tradeoff, and ISP vs residential proxies for the static-residential middle ground.
Mobile proxies route through cellular networks (4G and 5G). Because carriers share a small pool of IPs across many subscribers via carrier-grade NAT, blocking one mobile IP risks blocking thousands of real users, so sites tolerate them the most. They are also the priciest.
Skip the rotation logic. Smart AI Proxy picks a residential, datacenter, or mobile IP per request, handles retries, and sits behind one endpoint, so you do not manage pools yourself.
By direction: forward vs reverse
A forward proxy sits in front of clients and faces out to the internet. It is what people usually mean by "proxy": you send your traffic through it. A reverse proxy sits in front of servers and faces in toward them, taking requests from the public and distributing them to backends for load balancing, caching, or TLS termination. Nginx in front of an app is a reverse proxy. The mechanics overlap, but the intent is opposite. We compare them in detail in forward vs reverse proxy.
By protocol: HTTP(S) vs SOCKS5
An HTTP proxy understands HTTP, which lets it act on requests at the application layer (the header rewriting above). An SSL/HTTPS proxy is just an HTTP proxy that also handles the CONNECT tunnel for encrypted traffic; see HTTP vs HTTPS proxies.
A SOCKS5 proxy works one layer lower. It forwards raw TCP and UDP without understanding the protocol on top, so it carries anything (HTTP, email, torrents, game traffic) but cannot rewrite headers because it never reads them. Reach for SOCKS5 when you need protocol-agnostic forwarding; reach for HTTP when you want request-level control. Details in what is a SOCKS5 proxy.
By anonymity: transparent, anonymous, elite
Proxies differ in how much they disclose. A transparent proxy forwards your real IP in headers like X-Forwarded-For and announces itself; it is used for caching and content filtering, not anonymity. An anonymous proxy hides your IP but still identifies itself as a proxy. An elite or high-anonymity proxy hides both, presenting as an ordinary client. A distorting proxy goes further and reports a fake IP. For scraping you generally want elite behavior, where the target cannot tell a proxy is involved.
Specialized and privacy proxies
The remaining names you will see are narrower tools rather than separate categories:
- Web proxy: a proxy you drive from a browser page, no client config required. Convenient, limited, often logged.
- CGI proxy: the same idea implemented as a server-side script that fetches pages for you.
- Suffix proxy: appends its name to the target URL; trivial to use, easy to block, common for bypassing simple filters.
- DNS proxy: forwards DNS queries rather than HTTP, used for filtering and geographic routing at the name-resolution layer.
- Tor (onion) and I2P: anonymity networks, not commercial proxies. They route through multiple volunteer relays for strong anonymity at a heavy speed cost. Useful for privacy, wrong for high-throughput scraping.
What you actually use a proxy for
Cut through the feature lists and proxies earn their place in four jobs:
Collecting data at scale. The dominant real use. Rotating IPs let you gather public data (prices, listings, search results) without a single address getting rate-limited or blocked. This is why scraping infrastructure is mostly proxy infrastructure.
Controlling where you appear. A proxy in another country lets you see geo-specific pricing, content, and search results, and lets you test how your own site behaves from elsewhere.
Performance through caching. A forward or reverse proxy that caches responses serves repeat requests locally, cutting latency and origin load. This is the original reason corporate networks ran proxies.
Security, filtering, and monitoring. As a chokepoint for traffic, a proxy can block categories of sites, inspect requests for policy violations, and log usage. This is content filtering and traffic monitoring, and it is why proxies are common on managed networks.
The risks, honestly
A proxy is a machine you route your traffic through, which means you are trusting whoever runs it.
A plain HTTP proxy can read everything you send over unencrypted connections, and a malicious operator can log credentials or inject content. Free public proxy lists are the worst offenders: you have no idea who runs them or what they keep. If a proxy is free and anonymous and you did not set it up, assume it is logging.
Two specific risks are worth naming. First, no encryption on the proxy hop: with a plain HTTP proxy, the link between you and the proxy is not encrypted unless the destination itself uses HTTPS. Second, unauthorized data storage: any proxy can record the requests passing through it, so logging policy matters as much as IP quality. For production work, use a provider with a clear policy rather than scraped free lists.
How to route traffic through a proxy
At its simplest, a proxy is a host, a port, and optional credentials. Most tools accept it directly. Here is a single request through an HTTP proxy with curl:
# Send the request through a proxy and check the exit IP curl -x "http://user:[email protected]:8080" "https://httpbin.org/ip" # Response shows the proxy's IP, not yours { "origin": "203.0.113.42" }
For a whole shell session you can set the HTTP_PROXY and HTTPS_PROXY environment variables, which most command-line tools respect. Browsers and operating systems expose the same host/port settings in their network preferences. The mechanics are identical everywhere; only the configuration field changes.
Using Crawlbase Smart AI Proxy
Managing pools, rotation, and retries yourself is the part that gets expensive. A managed endpoint hides it: you point your client at one host, and it selects an IP, rotates on failure, and returns the response. With a managed provider the same curl call works, minus the part where you maintain the pool. That is the model Crawlbase Smart AI Proxy uses, and you can wire it up against the API docs.
How to find and verify your proxy IP
To see the IP a proxy is presenting, request an echo service like httpbin.org/ip or ifconfig.me through it and read back the address, exactly as in the curl example above. If it shows the proxy's IP rather than your own, routing works.
And yes, an IP can be spoofed in the sense that headers like X-Forwarded-For are trivially forged, which is why servers never trust them for security decisions. The TCP source address itself cannot be faked for a normal two-way connection, because the handshake requires receiving packets back. A proxy does not fake your IP so much as legitimately originate the connection from its own.
Key takeaways
- A proxy is one layer of indirection. It makes the request for you, so the origin sees its IP, not yours.
- HTTP proxies see your requests; HTTPS tunnels do not. That single fact explains what each can and cannot do.
- Pick along four axes, not sixteen types: origin, direction, protocol, anonymity. Most named "types" are just one of these choices.
- For scraping, origin is what you pay for. Datacenter for speed, residential and mobile for evading blocks.
- You are trusting whoever runs the proxy. Free anonymous proxies are a logging risk; use a provider with a clear policy.
Frequently Asked Questions (FAQs)
How does a proxy server differ from a packet-filtering firewall?
A proxy operates at the application layer and makes requests on your behalf, so it can understand and rewrite the traffic it carries. A packet-filtering firewall works lower down, allowing or dropping packets based on addresses and ports without acting as an endpoint. A firewall decides whether traffic may pass; a proxy participates in it.
What is a proxy server address?
It is the host and port your client connects to, for example proxy.example.com:8080, sometimes with a username and password. That pair is all a tool needs to route traffic through the proxy.
How do you find a proxy server address?
For a proxy you control or subscribe to, the provider gives you the host, port, and credentials. On a managed network, the address may be set in your operating system or browser network settings, or pushed automatically by an admin. To see the IP a proxy presents, request an echo service through it.
What is a common function of a proxy server?
The most common production use is collecting public web data at scale by rotating IPs to avoid rate limits and blocks. On corporate networks, the most common function is caching and content filtering at a single controlled chokepoint.
Crawl any site at scale, without fighting infrastructure.
Crawlbase handles proxies, fingerprints, and CAPTCHAs so your team ships data pipelines instead of maintaining crawl plumbing. 1,000 requests free, no card required.
