Proxies & Reverse Proxies
[!note] every packet on the internet touches a middleman. this is about the middlemen.
- 1. what even is a proxy
- 2. forward proxies — the client’s agent
- 3. reverse proxies — the server’s agent
- 4. the full taxonomy
- 5. headers — how proxies talk to each other
- 6. proxy protocols — the plumbing
- 7. real-world architectures
- 8. performance: what proxies cost and save
- 9. security considerations
- 10. quick reference configs
- 11. sources & further reading
1. what even is a proxy
a proxy is any machine that sits between two parties and forwards traffic on their behalf. it speaks to one side, turns around, and speaks to the other. neither party is talking directly to the other — the proxy is always in the middle.
CLIENT ─────────► PROXY ─────────► SERVER
request request
◄────────── ◄──────────
response response
why put something in the middle at all?
- identity: hide who you are (client anonymity) or hide where servers are (server anonymity)
- control: filter, block, log, or transform traffic
- performance: cache, compress, or batch requests
- scalability: distribute load across many backends
- security: terminate TLS, inspect packets, enforce auth
the two fundamental types split on whose side the proxy is on:
2. forward proxies — the client’s agent
2.1 definition
a forward proxy acts on behalf of clients. the client knows about the proxy and explicitly sends its requests through it. the server sees the proxy’s IP, not the client’s.
ALICE ──► FORWARD PROXY ──► google.com
BOB ──► (google sees proxy IP, not alice/bob)
CAROL──►
who configures it: the client side. browser settings, HTTP_PROXY env var, PAC files, OS network settings.
2.2 what the proxy sees
the HTTP CONNECT method was invented specifically for forward proxies. when a browser wants to tunnel HTTPS through a proxy:
the proxy opens a TCP connection to api.github.com:443, then sends 200 Connection Established back. from that point on it’s just a dumb pipe — it can’t read the encrypted HTTPS. but it does see the hostname.
for plain HTTP (no TLS), the proxy sees everything — the full URL, all headers, the body.
2.3 subtypes
transparent proxy (intercepting proxy)
clients don’t know they’re going through a proxy. traffic is redirected at the network level (iptables rules, router config). used by:
- ISPs injecting ads
- corporate networks monitoring employees
- parental controls
- public WiFi captive portals
CLIENT ──► [thinks it's talking to server] ──► TRANSPARENT PROXY ──► SERVER
(OS sends to default gateway) (quietly intercepts)
the proxy issues a Via header or injects itself silently. legally questionable in many contexts.
anonymous proxy
hides the client’s IP from the server. server sees proxy IP. the proxy may still add X-Forwarded-For: client_ip which reveals the original IP — so “anonymous” is a spectrum:
| Type | X-Forwarded-For sent? |
Server sees client IP? |
|---|---|---|
| Transparent | Yes | Effectively yes |
| Anonymous | Yes, but can be ignored | No (unless server reads header) |
| Elite (high-anonymity) | No | No |
| Distorting | Yes, but fake IP | No |
SOCKS proxy (SOCKS4 / SOCKS5)
operates at layer 5 — doesn’t care about the protocol. it just forwards raw TCP (SOCKS4) or TCP+UDP (SOCKS5) connections. curl --socks5 proxy:1080 https://example.com.
SOCKS5 over HTTP proxy: SOCKS5 is more powerful — it supports UDP (needed for DNS, VoIP), authentication, and IPv6.
SOCKS5 handshake:
CLIENT → {version:5, nmethods:1, methods:[0x00 no-auth]}
PROXY → {version:5, method:0x00}
CLIENT → {ver:5, cmd:CONNECT, addr:example.com, port:443}
PROXY → {ver:5, rep:0x00 success, ...}
[raw TCP tunnel established]
web proxy / HTTP proxy
the classic forward proxy. understands HTTP, can:
- cache responses (Squid)
- filter content (block NSFW, malware domains)
- log everything
- modify headers
used by enterprises, schools, ISPs. Squid is the canonical implementation.
DNS proxy
intercepts DNS queries before they reach upstream resolvers. used for:
- ad blocking (Pi-hole)
- split-horizon DNS (internal vs external resolution)
- DNS-based filtering
- caching
3. reverse proxies — the server’s agent
3.1 definition
a reverse proxy acts on behalf of servers. clients think they’re talking to the server — the reverse proxy’s address is what’s published in DNS. the actual backend servers are hidden.
┌── server-a (192.168.1.10)
CLIENT ──► REVERSE PROXY ──► LB ───┤
(sees proxy IP only) └── server-b (192.168.1.11)
who configures it: the server side. clients have no idea it’s there (usually).
3.2 what the proxy hides
from the client: the number of backend servers, their IPs, their software (nginx masquerades as whatever you want), which server handles which request, backend failures (proxy can retry transparently).
3.3 subtypes
load balancer
distributes requests across multiple backend servers. the core question: which backend gets this request?
algorithms:
| Algorithm | How it picks | Good for |
|---|---|---|
| Round Robin | next in rotation | uniform requests |
| Weighted Round Robin | next, weighted by capacity | heterogeneous backends |
| Least Connections | backend with fewest active connections | long-lived connections |
| IP Hash | hash(client_ip) % N |
session stickiness |
| Random | random | simple, works surprisingly well |
| Least Response Time | fastest backend wins | latency-sensitive APIs |
layer 4 vs layer 7:
- L4 (transport): routes by TCP/UDP info only (IP, port). fast. can’t inspect HTTP. examples: AWS NLB, HAProxy in TCP mode.
- L7 (application): routes by HTTP content (URL path, headers, cookies). smarter. can do A/B routing, canary deploys. examples: nginx, HAProxy in HTTP mode, AWS ALB.
API gateway
a reverse proxy that specifically handles API traffic, adding:
- authentication / authorization (JWT verification, API key validation)
- rate limiting (leaky bucket, token bucket)
- request/response transformation (strip/add headers, translate formats)
- routing (v1 → service-a, v2 → service-b)
- observability (metrics, tracing, logging per endpoint)
examples: Kong, AWS API Gateway, nginx + lua, Envoy, Traefik.
CLIENT ──► API GATEWAY
│
├── verify JWT
├── check rate limit (redis: 100 req/min)
├── route /v2/users → users-service:8080
├── strip internal headers
└── forward + inject X-Request-ID
TLS termination proxy
handles the SSL/TLS handshake so backend servers don’t have to. the connection from client to proxy is encrypted; from proxy to backend can be plain HTTP (on a private network) or re-encrypted (TLS re-origination).
why offload TLS?
- backends don’t need certificates
- crypto is CPU-intensive — dedicated proxy can use hardware acceleration
- centralized certificate management (Let’s Encrypt automation in nginx/caddy)
- one place to enforce cipher suites, TLS version (no TLS 1.0 please)
CLIENT ────[TLS]────► NGINX (terminates TLS) ────[HTTP]────► backend:8080
caching reverse proxy
stores responses and serves them without hitting the backend. the proxy is the cache. examples: Varnish, nginx proxy_cache, Cloudflare CDN.
cache key: usually method + host + path + (selected headers). must match on the way in.
cache invalidation: the hardest problem in CS. strategies:
- TTL-based: cache for N seconds, then re-fetch
- surrogate keys / cache tags: tag responses, invalidate by tag
- purge API: explicit
PURGE /pathto the proxy
ingress controller (kubernetes)
a reverse proxy that reads Kubernetes Ingress resources and routes traffic to Services. nginx ingress, Traefik, Envoy/Istio are common. it’s just a programmable reverse proxy that watches the k8s API and reconfigures itself.
# this Ingress rule...
- host: api.myapp.com
http:
paths:
- path: /v2
backend: service-v2:80
# ...becomes an nginx upstream block automatically
service mesh sidecar
each microservice gets a sidecar proxy (Envoy, Linkerd proxy) injected into its pod. all traffic in and out flows through the sidecar. the mesh control plane configures all sidecars centrally.
this is the extreme end: every service is simultaneously a reverse proxy for incoming traffic and a forward proxy for outgoing traffic. mTLS everywhere, zero-trust networking, distributed tracing — all handled by the mesh, not the application code.
4. the full taxonomy
| Type | Layer | Client knows? | Hides | Use case |
|---|---|---|---|---|
| Transparent proxy | 7 | No | Nothing | ISP filtering, captive portals |
| Anonymous proxy | 7 | Yes | Client IP | Privacy browsing |
| Elite proxy | 7 | Yes | Client IP + proxy presence | Scraping, security research |
| SOCKS5 proxy | 5 | Yes | Client IP | Tunneling, VPNs |
| DNS proxy | DNS | No | N/A | Ad blocking, split DNS |
| Load balancer | 4/7 | No | Backend pool | Scalability |
| API gateway | 7 | No | Backend services | Auth, rate limits, routing |
| TLS termination | 7 | No | Backend topology | Cert management |
| Caching proxy | 7 | No | Backend load | CDN, performance |
| Ingress controller | 7 | No | k8s internal routing | Kubernetes |
| Service mesh | 7 | No | Everything | Microservices zero-trust |
5. headers — how proxies talk to each other
HTTP headers carry the metadata that survives proxy hops.
X-Forwarded-For
X-Forwarded-For: client, proxy1, proxy2
each proxy appends the IP it received the connection from. the leftmost IP is supposedly the original client — but can be spoofed by the client. never trust the leftmost IP for security without validating the chain.
REMOTE_ADDR (TCP source IP) is always the immediate upstream. that’s the only IP you actually verified.
X-Real-IP
a simpler single-value alternative. nginx’s ngx_http_realip_module sets $remote_addr to this value if it trusts the upstream. one value — the proxy decided what the “real” IP is.
Forwarded (RFC 7239)
the standardized version. structured:
Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43;host=example.com
supports for (client), by (proxy receiving), host (original Host header), proto (original protocol). more precise than the X-Forwarded-* family.
Via
Via: 1.1 vegur, 1.1 varnish, 1.1 nginx
added by each proxy in the chain. shows the proxy software and HTTP version used. useful for debugging — you can see who touched your request.
Connection and hop-by-hop headers
Connection: keep-alive and Connection: close are hop-by-hop — they apply to the immediate connection, not end-to-end. each proxy strips them before forwarding. other hop-by-hop headers: TE, Trailers, Transfer-Encoding, Upgrade, Keep-Alive, Proxy-Authorization, Proxy-Authenticate.
Content-Length vs Transfer-Encoding: chunked. if the proxy strips Transfer-Encoding but the backend processes Content-Length differently, an attacker can "smuggle" a second request inside the first. entire class of high-severity CVEs. fix: keep proxy and backend HTTP parsers in sync.
6. proxy protocols — the plumbing
6.1 HTTP CONNECT tunneling
already covered. used for HTTPS through forward proxies, also used by SSH-over-HTTP, websockets through proxies.
CONNECT target.host:port HTTP/1.1
Host: target.host:port
[optional: Proxy-Authorization: Basic ...]
response: 200 Connection Established — then raw bytes flow.
6.2 PROXY protocol (HAProxy protocol)
not to be confused with HTTP proxying. this is a small header prepended to a TCP connection by a load balancer so the backend knows the original client IP, even though the TCP connection comes from the LB.
PROXY TCP4 192.168.1.5 10.0.0.1 56324 443\r\n
[then: the actual HTTP/TLS bytes start]
v1 is plaintext. v2 is binary. nginx, HAProxy, AWS NLB support it. backend must explicitly opt in to parse it — if you enable it on the LB but not the backend, the backend sees garbage at the start of every connection.
6.3 WebSocket proxying
HTTP upgrades to WebSocket:
GET /socket HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
a naive HTTP proxy drops the Upgrade header (hop-by-hop). a WebSocket-aware proxy must forward it and then switch to bidirectional byte streaming. nginx needs proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade";.
6.4 gRPC proxying
gRPC runs over HTTP/2. HTTP/2 is multiplexed — multiple requests share one TCP connection with stream IDs. a proxy must be HTTP/2-aware. an HTTP/1.1 proxy cannot forward gRPC at all (frames don’t map to HTTP/1.1 messages).
nginx needs grpc_pass grpc://backend:50051; (not proxy_pass). also: gRPC uses trailers (headers after the body) for status codes — the proxy must support HTTP/2 trailers.
7. real-world architectures
7.1 the classic three-tier
Internet → [CDN / edge cache] → [Load Balancer] → [App Servers] → [DB]
each arrow is a reverse proxy of some kind.
7.2 microservices with API gateway
Mobile App ─► API Gateway (Kong/AWS) ─► /auth → auth-service
─► /users → user-service
─► /posts → post-service
the gateway handles JWT validation, rate limiting, routing. services are fully hidden.
7.3 VPN as forward proxy
YOUR LAPTOP ──[encrypted tunnel]──► VPN SERVER ──► internet
the VPN server is a forward proxy. your ISP sees encrypted traffic to the VPN server. websites see the VPN server’s IP. note: the VPN provider sees everything — you’re just moving who you trust.
7.4 nginx as everything simultaneously
nginx can be a forward proxy (with resolver configured), TLS terminator, caching proxy, load balancer, and static file server — all in one process, all configured via nginx.conf. it’s proxy swiss-army-knife.
8. performance: what proxies cost and save
8.1 costs
- latency: every proxy adds at least one RTT. a chain of 3 proxies adds ~3× RTT overhead before the first byte reaches the backend.
- connection overhead: TCP handshakes, TLS handshakes per hop. mitigated by connection pooling (keep-alive).
- CPU: TLS termination, header parsing, logging, encryption. mitigated by hardware TLS offload (AWS, Cloudflare use custom ASICs).
- memory: buffering. a proxy receiving a large request body must buffer it if the backend is slow (can cause OOM).
8.2 savings
- caching: a well-configured caching proxy turns O(N) backend requests into O(1) for hot content.
- connection multiplexing: HTTP/2 allows one TCP connection to carry many requests in parallel. a proxy can multiplex many HTTP/1.1 connections from clients onto a few HTTP/2 connections to backends.
- compression: gzip/brotli at the proxy, serving pre-compressed content or compressing on the fly.
8.3 the thundering herd problem
a caching proxy under high load: if a cache entry expires, many requests arrive simultaneously for the same resource. all of them miss the cache and hit the backend simultaneously — stampede.
mitigations:
- lock / coalesce: only one request goes to the backend; others wait for it to populate the cache (Varnish grace mode)
- stale-while-revalidate: serve stale content while refreshing in the background
- jitter on TTL: don’t let all entries expire at the same time
9. security considerations
9.1 SSRF — Server-Side Request Forgery
if an application makes outbound requests to user-supplied URLs (e.g., webhook URLs, image fetching), and those requests go through the application server — the application is a de facto forward proxy. attackers use it to reach internal services:
POST /fetch-url
{"url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"}
↑ AWS metadata endpoint — internal only, but server can reach it
fix: validate URLs against an allowlist, use a dedicated egress proxy with strict controls.
9.2 proxy authentication bypass
407 Proxy Authentication Required is the proxy equivalent of 401. if the proxy only validates on the first request and then trusts the connection, an attacker who can inject a Connection: close mid-stream can force re-authentication on a new connection.
9.3 open proxy
a forward proxy that accepts requests from anyone on the internet and forwards them anywhere. useful for attackers: they can use your server as a launchpad, and the target sees your IP. never run an open proxy accidentally. check: curl -x yourserver:3128 https://ifconfig.me should fail from outside your network.
X-Forwarded-For header from clients without sanitizing it, an attacker can set X-Forwarded-For: 127.0.0.1 and potentially bypass IP-based access controls. always strip client-supplied X-Forwarded-For at the first trusted proxy.
10. quick reference configs
nginx as reverse proxy + load balancer
upstream backend {
least_conn;
server 10.0.0.1:8080 weight=3;
server 10.0.0.2:8080 weight=1;
keepalive 32;
}
server {
listen 443 ssl;
ssl_certificate /etc/nginx/cert.pem;
ssl_certificate_key /etc/nginx/key.pem;
location / {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection ""; # enable keepalive to upstream
}
}
nginx as caching proxy
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m
max_size=1g inactive=60m use_temp_path=off;
location /api/ {
proxy_cache my_cache;
proxy_cache_valid 200 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on; # coalesce thundering herd
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://backend;
}
HAProxy TCP load balancer (L4)
frontend ft_https
bind *:443
mode tcp
default_backend bk_https
backend bk_https
mode tcp
balance roundrobin
option tcp-check
server s1 10.0.0.1:443 check
server s2 10.0.0.2:443 check
curl through SOCKS5
curl --socks5-hostname proxy.host:1080 https://target.com