Proxies & Reverse Proxies
[!note] every packet on the internet touches a middleman. this is about the middlemen.
- 1. what even is a proxy
- 2. forward proxies — the client’s agent
- 3. reverse proxies — the server’s agent
- 4. the full picture
- 5. headers — how proxies talk to each other
- 6. proxy protocols — the plumbing
- 7. real-world architectures
- 8. performance notes
- 9. security stuff to know
- 10. quick config reference
- sources
1. what even is a proxy
a proxy is just a middleman. sits between two parties, forwards traffic. neither side is talking directly to the other.
CLIENT ──► PROXY ──► SERVER
◄──────────◄──
why bother? few reasons:
- hide who you are (client anonymity) or hide where your servers are
- filter or log traffic
- cache stuff so the backend doesn’t get hammered
- spread load across multiple servers
- terminate TLS in one place
the entire internet runs on proxies btw. CDNs, load balancers, your corporate firewall, API gateways — all proxies. once you see it you can’t unsee it.
the fundamental split is whose side the proxy is on:
- forward proxy → works for the client. client knows about it.
- reverse proxy → works for the server. client usually has no idea.
2. forward proxies — the client’s agent
2.1 what it does
acts on behalf of clients. the client explicitly sends requests through it. the server sees the proxy’s IP, not the client’s.
ALICE ──►
BOB ──► FORWARD PROXY ──► google.com
CAROL──► (google sees proxy IP, not alice/bob/carol)
the client side configures this — browser settings, HTTP_PROXY env var, etc.
2.2 how HTTPS works through it (the CONNECT method)
for plain HTTP the proxy sees everything. for HTTPS, the browser sends a special CONNECT request:
CONNECT api.github.com:443 HTTP/1.1
Host: api.github.com:443
proxy opens a TCP connection to github, replies 200 Connection Established, and then just passes bytes through blindly. can’t read the encrypted traffic, but it does see the hostname (SNI).
2.3 the subtypes
transparent proxy
client has no idea. network silently redirects traffic at the router/iptables level. used by ISPs, corporate networks, parental controls, captive portals.
[!note] “transparent” here means transparent to the client — invisible. not transparent as in “you can see through it.” confusing name.
anonymous vs elite proxies
this is a spectrum based on what headers the proxy sends:
| type | sends X-Forwarded-For? | server sees real IP? |
|---|---|---|
| transparent | yes | effectively yes |
| anonymous | yes | not in TCP, but in header |
| elite (high-anonymity) | no | no |
| distorting | yes, but fake IP | no |
elite proxy = server literally cannot tell you’re behind a proxy.
SOCKS5
operates at layer 5 — doesn’t care about the protocol at all. forwards raw TCP and UDP. useful for:
- SSH tunnels
- VPN-like setups
- anything that’s not HTTP
curl --socks5-hostname proxy.host:1080 https://target.com
the key difference from HTTP proxies: SOCKS5 supports UDP (so DNS, VoIP etc. work) and IPv6.
[!warning] if you use
--socks5instead of--socks5-hostname, DNS resolution happens on your machine, not through the proxy — your real location leaks via DNS. always use--socks5-hostname.
web proxy / HTTP proxy
the classic. Squid is the main one. understands HTTP, can cache responses, filter content, log everything, modify headers. enterprises and schools love these.
DNS proxy
intercepts DNS queries before they reach upstream. Pi-hole does this — you send a query for ads.doubleclick.net, Pi-hole returns 0.0.0.0, ad never loads. also used for split-horizon DNS (internal vs external resolution).
3. reverse proxies — the server’s agent
3.1 what it does
acts on behalf of servers. clients think they’re talking directly to the server — the reverse proxy’s address is what DNS points to. actual backends are hidden.
CLIENT ──► REVERSE PROXY ──► server-a
(only sees proxy) ├─► server-b
└─► server-c
the server side configures this. clients usually have no clue it’s there.
what’s hidden from the client: number of backends, their IPs, their software, backend failures (proxy can retry transparently).
3.2 load balancer
distributes requests across multiple backends. the core problem: which backend gets this request?
algorithms:
| algorithm | how it picks | good for |
|---|---|---|
| round robin | next in rotation | uniform requests |
| weighted round robin | next, weighted by server capacity | mixed hardware |
| least connections | backend with fewest active connections | long-lived connections |
| IP hash | hash(client_ip) % N |
session stickiness |
| random | random | surprisingly works fine |
| least response time | fastest backend wins | latency-sensitive APIs |
L4 vs L7 — this confused me for a while:
- L4: routes by TCP/UDP info only (IP + port). fast, can’t inspect HTTP. AWS NLB, HAProxy in TCP mode.
- L7: routes by HTTP content (URL path, headers, cookies). smarter — can do A/B routing, canary deploys. nginx, AWS ALB.
one thing that tripped me up: IP hash stickiness breaks when you add/remove servers because hash(ip) % N changes for nearly every client. the fix is consistent hashing — put servers on a ring, hash the client IP, go clockwise to find the server. adding/removing a server only migrates ~1/N of clients instead of almost all of them.
3.3 API gateway
a reverse proxy specifically for API traffic. adds:
- JWT verification / API key validation
- rate limiting (token bucket, leaky bucket)
- routing by path/version (
/v1→ old service,/v2→ new service) - request/response transformation
- logging and metrics per endpoint
CLIENT ──► API GATEWAY
├── verify JWT
├── check rate limit (100 req/min via redis)
├── route /v2/users → users-service:8080
└── strip internal headers, inject X-Request-ID
Kong, AWS API Gateway, Traefik, Envoy are the common ones.
3.4 TLS termination proxy
handles the TLS handshake so backends don’t have to. client→proxy is encrypted, proxy→backend is plain HTTP (on a private network).
why do this?
- backends don’t need certs
- crypto is CPU-intensive — dedicated hardware/proxy handles it
- one place to enforce cipher suites, TLS version minimums
- centralized cert management (Let’s Encrypt automation)
CLIENT ──[HTTPS]──► NGINX (terminates TLS) ──[HTTP]──► backend:8080
3.5 caching reverse proxy
stores responses, serves them without hitting the backend. Varnish is the classic. nginx has proxy_cache. Cloudflare does this at scale.
cache key = usually method + host + path + some headers.
the hard part is cache invalidation. main strategies:
- TTL-based: cache for N seconds, re-fetch after
- surrogate keys / cache tags: tag responses, invalidate by tag
- purge API: explicit
PURGE /path
thundering herd problem — if a cache entry expires under high load, all requests simultaneously miss and hammer the backend. fixes:
- lock: only one request fetches, others wait (Varnish grace mode)
- stale-while-revalidate: serve the stale version while refreshing in background
- TTL jitter: randomize expiry so everything doesn’t expire at once
3.6 ingress controller (kubernetes)
just a programmable reverse proxy that watches the k8s API and reconfigures itself when you apply Ingress resources.
- host: api.myapp.com
http:
paths:
- path: /v2
backend: service-v2:80
# nginx ingress reads this and updates its upstream config automatically
nginx ingress, Traefik, Envoy/Istio are the main ones.
3.7 service mesh sidecar
the extreme version. every pod gets an Envoy sidecar injected into it. all traffic in and out flows through the sidecar. control plane (istiod in Istio) configures all sidecars centrally.
every service is simultaneously a reverse proxy for incoming traffic and a forward proxy for outgoing traffic. mTLS everywhere, zero-trust networking, distributed tracing — all handled by the mesh, not the application code.
honestly overkill unless you’re running a lot of microservices and really need zero-trust between them.
4. the full picture
| type | layer | client knows? | hides | use case |
|---|---|---|---|---|
| transparent proxy | 7 | no | nothing | ISP filtering, captive portals |
| anonymous proxy | 7 | yes | client IP | privacy browsing |
| elite proxy | 7 | yes | client IP + proxy presence | scraping, security research |
| SOCKS5 | 5 | yes | client IP | tunneling, VPNs |
| DNS proxy | DNS | no | N/A | ad blocking, split DNS |
| load balancer | 4/7 | no | backend pool | scalability |
| API gateway | 7 | no | backend services | auth, rate limits, routing |
| TLS termination | 7 | no | backend topology | cert management |
| caching proxy | 7 | no | backend load | CDN, performance |
| ingress controller | 7 | no | k8s internal routing | kubernetes |
| service mesh | 7 | no | everything | microservices zero-trust |
5. headers — how proxies talk to each other
X-Forwarded-For
X-Forwarded-For: client, proxy1, proxy2
each proxy appends the IP it received from. leftmost is supposedly the original client — but clients can spoof this. REMOTE_ADDR (TCP source IP) is the only IP you actually verified. never trust leftmost X-FF for security without validating the whole chain.
X-Real-IP
simpler single-value alternative. nginx’s realip module can use this to set $remote_addr if it trusts the upstream.
Forwarded (RFC 7239)
the standardized structured version:
Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43;host=example.com
more precise than X-Forwarded-* but less commonly seen in practice.
Via
Via: 1.1 vegur, 1.1 varnish, 1.1 nginx
each proxy adds itself. useful for debugging — you can see exactly who touched your request.
hop-by-hop headers
Connection: keep-alive, Transfer-Encoding, Upgrade etc. apply only to the immediate connection, not end-to-end. each proxy strips them before forwarding.
[!note] HTTP Request Smuggling exploits disagreements between how a front-end proxy and back-end server parse
Content-LengthvsTransfer-Encoding: chunked. if they disagree, an attacker can “smuggle” a second request inside the first. whole class of high-severity CVEs. fix: keep proxy and backend HTTP parsers in sync. PortSwigger has good material on this.
6. proxy protocols — the plumbing
HTTP CONNECT tunneling
already covered above. also used by SSH-over-HTTP and WebSockets through proxies.
PROXY protocol (HAProxy protocol)
not HTTP proxying — this is a tiny header prepended to a TCP connection by a load balancer so the backend knows the real client IP even though TCP comes from the LB.
PROXY TCP4 192.168.1.5 10.0.0.1 56324 443\r\n
[then: actual HTTP/TLS bytes]
v1 is plaintext, v2 is binary. backend must explicitly parse it — if you enable it on the LB but forget the backend, the backend sees garbage at the start of every connection (learned this the hard way).
WebSocket proxying
HTTP upgrades to WebSocket via the Upgrade header — which is hop-by-hop. a naive HTTP proxy drops it. nginx needs:
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
gRPC proxying
gRPC runs over HTTP/2. an HTTP/1.1 proxy cannot forward gRPC at all — the frames don’t map. nginx needs grpc_pass not proxy_pass. also: gRPC uses HTTP/2 trailers for status codes, so the proxy needs to support those too.
7. real-world architectures
classic three-tier
internet → [CDN/edge cache] → [load balancer] → [app servers] → [DB]
every arrow is a reverse proxy of some kind.
microservices with API gateway
mobile app → API gateway → /auth → auth-service
→ /users → user-service
→ /posts → post-service
gateway handles JWT, rate limiting, routing. services are completely hidden.
VPN as forward proxy
your laptop ──[encrypted tunnel]──► VPN server ──► internet
your ISP sees encrypted traffic to the VPN server. websites see the VPN IP. but: the VPN provider sees everything — you’re just moving who you trust, not eliminating the trust requirement.
nginx as everything simultaneously
nginx can be a forward proxy, TLS terminator, caching proxy, load balancer, and static file server all in one process. it’s ridiculous how much you can do with nginx.conf.
8. performance notes
what proxies cost
- latency: every proxy hop adds at least one RTT. chain of 3 proxies = ~3x RTT overhead before the first byte hits the backend.
- connection overhead: TCP + TLS handshakes per hop. mitigated by connection pooling (keep-alive to upstream).
- CPU: TLS termination, header parsing, logging. mitigated by hardware acceleration.
- memory: buffering large request bodies can OOM a proxy if the backend is slow.
what proxies save
- caching: hot content served without touching the backend at all
- connection multiplexing: proxy can multiplex many HTTP/1.1 client connections onto a few HTTP/2 connections to backends
- compression: gzip/brotli at the proxy edge
latency math:
total = client→proxy RTT + proxy processing + proxy→backend RTT
+ backend processing + backend→proxy RTT + proxy→client RTT
cached response = client→proxy RTT + cache lookup + proxy→client RTT
(backend RTTs → 0)
9. security stuff to know
SSRF — Server-Side Request Forgery
if an app fetches user-supplied URLs (webhooks, image URLs), the app server itself is acting as a forward proxy. attackers use this to reach internal services:
POST /fetch-url
{"url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"}
↑ AWS metadata endpoint — internal only, but the server can reach it
fix: validate URLs against an allowlist, use a dedicated egress proxy with strict controls.
open proxy
a forward proxy that accepts requests from anyone on the internet and forwards them anywhere. accidentally running one is bad — attackers use your server as a launchpad. check: curl -x yourserver:3128 https://ifconfig.me should fail from outside your network.
header injection
if a proxy blindly forwards X-Forwarded-For from clients without sanitizing it, an attacker can set X-Forwarded-For: 127.0.0.1 and potentially bypass IP-based access controls. always strip client-supplied X-Forwarded-For at the first trusted proxy.
10. quick config reference
nginx reverse proxy + load balancer
upstream backend {
least_conn;
server 10.0.0.1:8080 weight=3;
server 10.0.0.2:8080 weight=1;
keepalive 32;
}
server {
listen 443 ssl;
ssl_certificate /etc/nginx/cert.pem;
ssl_certificate_key /etc/nginx/key.pem;
location / {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection ""; # keepalive to upstream
}
}
nginx caching proxy
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m
max_size=1g inactive=60m use_temp_path=off;
location /api/ {
proxy_cache my_cache;
proxy_cache_valid 200 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on; # coalesces thundering herd
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://backend;
}
HAProxy L4 load balancer
frontend ft_https
bind *:443
mode tcp
default_backend bk_https
backend bk_https
mode tcp
balance roundrobin
option tcp-check
server s1 10.0.0.1:443 check
server s2 10.0.0.2:443 check