Proxies & Reverse Proxies

[!note] every packet on the internet touches a middleman. this is about the middlemen.


1. what even is a proxy

a proxy is any machine that sits between two parties and forwards traffic on their behalf. it speaks to one side, turns around, and speaks to the other. neither party is talking directly to the other — the proxy is always in the middle.

CLIENT ─────────► PROXY ─────────► SERVER
       request           request
              ◄──────────       ◄──────────
               response           response

why put something in the middle at all?

  • identity: hide who you are (client anonymity) or hide where servers are (server anonymity)
  • control: filter, block, log, or transform traffic
  • performance: cache, compress, or batch requests
  • scalability: distribute load across many backends
  • security: terminate TLS, inspect packets, enforce auth
Key insight: the entire internet runs on proxies. CDNs are proxies. load balancers are proxies. your corporate firewall is a proxy. API gateways are proxies. NAT in your home router is a (very dumb) proxy. knowing proxy types = knowing how the internet actually works.

the two fundamental types split on whose side the proxy is on:

FORWARD vs REVERSE — the fundamental split


2. forward proxies — the client’s agent

2.1 definition

a forward proxy acts on behalf of clients. the client knows about the proxy and explicitly sends its requests through it. the server sees the proxy’s IP, not the client’s.

ALICE ──► FORWARD PROXY ──► google.com
BOB  ──►                        (google sees proxy IP, not alice/bob)
CAROL──►

who configures it: the client side. browser settings, HTTP_PROXY env var, PAC files, OS network settings.

2.2 what the proxy sees

the HTTP CONNECT method was invented specifically for forward proxies. when a browser wants to tunnel HTTPS through a proxy:

CLIENT → PROXY (plaintext) CONNECT api.github.com:443 HTTP/1.1 Host: api.github.com:443

the proxy opens a TCP connection to api.github.com:443, then sends 200 Connection Established back. from that point on it’s just a dumb pipe — it can’t read the encrypted HTTPS. but it does see the hostname.

for plain HTTP (no TLS), the proxy sees everything — the full URL, all headers, the body.

2.3 subtypes

transparent proxy (intercepting proxy)

clients don’t know they’re going through a proxy. traffic is redirected at the network level (iptables rules, router config). used by:

  • ISPs injecting ads
  • corporate networks monitoring employees
  • parental controls
  • public WiFi captive portals
CLIENT ──► [thinks it's talking to server] ──► TRANSPARENT PROXY ──► SERVER
           (OS sends to default gateway)        (quietly intercepts)

the proxy issues a Via header or injects itself silently. legally questionable in many contexts.

anonymous proxy

hides the client’s IP from the server. server sees proxy IP. the proxy may still add X-Forwarded-For: client_ip which reveals the original IP — so “anonymous” is a spectrum:

Type X-Forwarded-For sent? Server sees client IP?
Transparent Yes Effectively yes
Anonymous Yes, but can be ignored No (unless server reads header)
Elite (high-anonymity) No No
Distorting Yes, but fake IP No

SOCKS proxy (SOCKS4 / SOCKS5)

operates at layer 5 — doesn’t care about the protocol. it just forwards raw TCP (SOCKS4) or TCP+UDP (SOCKS5) connections. curl --socks5 proxy:1080 https://example.com.

SOCKS5 over HTTP proxy: SOCKS5 is more powerful — it supports UDP (needed for DNS, VoIP), authentication, and IPv6.

SOCKS5 handshake:
  CLIENT → {version:5, nmethods:1, methods:[0x00 no-auth]}
  PROXY  → {version:5, method:0x00}
  CLIENT → {ver:5, cmd:CONNECT, addr:example.com, port:443}
  PROXY  → {ver:5, rep:0x00 success, ...}
  [raw TCP tunnel established]

web proxy / HTTP proxy

the classic forward proxy. understands HTTP, can:

  • cache responses (Squid)
  • filter content (block NSFW, malware domains)
  • log everything
  • modify headers

used by enterprises, schools, ISPs. Squid is the canonical implementation.

DNS proxy

intercepts DNS queries before they reach upstream resolvers. used for:

  • ad blocking (Pi-hole)
  • split-horizon DNS (internal vs external resolution)
  • DNS-based filtering
  • caching

INTERACTIVE — FORWARD PROXY: trace a request through each subtype


3. reverse proxies — the server’s agent

3.1 definition

a reverse proxy acts on behalf of servers. clients think they’re talking to the server — the reverse proxy’s address is what’s published in DNS. the actual backend servers are hidden.

                                    ┌── server-a (192.168.1.10)
CLIENT ──► REVERSE PROXY ──► LB ───┤
(sees proxy IP only)                └── server-b (192.168.1.11)

who configures it: the server side. clients have no idea it’s there (usually).

3.2 what the proxy hides

from the client: the number of backend servers, their IPs, their software (nginx masquerades as whatever you want), which server handles which request, backend failures (proxy can retry transparently).

3.3 subtypes

load balancer

distributes requests across multiple backend servers. the core question: which backend gets this request?

algorithms:

Algorithm How it picks Good for
Round Robin next in rotation uniform requests
Weighted Round Robin next, weighted by capacity heterogeneous backends
Least Connections backend with fewest active connections long-lived connections
IP Hash hash(client_ip) % N session stickiness
Random random simple, works surprisingly well
Least Response Time fastest backend wins latency-sensitive APIs

layer 4 vs layer 7:

  • L4 (transport): routes by TCP/UDP info only (IP, port). fast. can’t inspect HTTP. examples: AWS NLB, HAProxy in TCP mode.
  • L7 (application): routes by HTTP content (URL path, headers, cookies). smarter. can do A/B routing, canary deploys. examples: nginx, HAProxy in HTTP mode, AWS ALB.
IP HASH STICKINESS given N backends and client IP c: backend_index = CRC32(c) mod N problem: adding/removing a server rehashes everything → most sessions break fix: consistent hashing (ring) place N servers on a [0,2³²) ring hash(c) → find next server clockwise add/remove server → only ~1/N sessions migrate

API gateway

a reverse proxy that specifically handles API traffic, adding:

  • authentication / authorization (JWT verification, API key validation)
  • rate limiting (leaky bucket, token bucket)
  • request/response transformation (strip/add headers, translate formats)
  • routing (v1 → service-a, v2 → service-b)
  • observability (metrics, tracing, logging per endpoint)

examples: Kong, AWS API Gateway, nginx + lua, Envoy, Traefik.

CLIENT ──► API GATEWAY
             │
             ├── verify JWT
             ├── check rate limit (redis: 100 req/min)
             ├── route /v2/users → users-service:8080
             ├── strip internal headers
             └── forward + inject X-Request-ID

TLS termination proxy

handles the SSL/TLS handshake so backend servers don’t have to. the connection from client to proxy is encrypted; from proxy to backend can be plain HTTP (on a private network) or re-encrypted (TLS re-origination).

why offload TLS?

  • backends don’t need certificates
  • crypto is CPU-intensive — dedicated proxy can use hardware acceleration
  • centralized certificate management (Let’s Encrypt automation in nginx/caddy)
  • one place to enforce cipher suites, TLS version (no TLS 1.0 please)
CLIENT ────[TLS]────► NGINX (terminates TLS) ────[HTTP]────► backend:8080

caching reverse proxy

stores responses and serves them without hitting the backend. the proxy is the cache. examples: Varnish, nginx proxy_cache, Cloudflare CDN.

cache key: usually method + host + path + (selected headers). must match on the way in.

cache invalidation: the hardest problem in CS. strategies:

  • TTL-based: cache for N seconds, then re-fetch
  • surrogate keys / cache tags: tag responses, invalidate by tag
  • purge API: explicit PURGE /path to the proxy

ingress controller (kubernetes)

a reverse proxy that reads Kubernetes Ingress resources and routes traffic to Services. nginx ingress, Traefik, Envoy/Istio are common. it’s just a programmable reverse proxy that watches the k8s API and reconfigures itself.

# this Ingress rule...
- host: api.myapp.com
  http:
    paths:
    - path: /v2
      backend: service-v2:80

# ...becomes an nginx upstream block automatically

service mesh sidecar

each microservice gets a sidecar proxy (Envoy, Linkerd proxy) injected into its pod. all traffic in and out flows through the sidecar. the mesh control plane configures all sidecars centrally.

this is the extreme end: every service is simultaneously a reverse proxy for incoming traffic and a forward proxy for outgoing traffic. mTLS everywhere, zero-trust networking, distributed tracing — all handled by the mesh, not the application code.

INTERACTIVE — REVERSE PROXY TYPES: click a type to see what it does


4. the full taxonomy

PROXY TAXONOMY TREE

Type Layer Client knows? Hides Use case
Transparent proxy 7 No Nothing ISP filtering, captive portals
Anonymous proxy 7 Yes Client IP Privacy browsing
Elite proxy 7 Yes Client IP + proxy presence Scraping, security research
SOCKS5 proxy 5 Yes Client IP Tunneling, VPNs
DNS proxy DNS No N/A Ad blocking, split DNS
Load balancer 4/7 No Backend pool Scalability
API gateway 7 No Backend services Auth, rate limits, routing
TLS termination 7 No Backend topology Cert management
Caching proxy 7 No Backend load CDN, performance
Ingress controller 7 No k8s internal routing Kubernetes
Service mesh 7 No Everything Microservices zero-trust

5. headers — how proxies talk to each other

HTTP headers carry the metadata that survives proxy hops.

X-Forwarded-For

X-Forwarded-For: client, proxy1, proxy2

each proxy appends the IP it received the connection from. the leftmost IP is supposedly the original client — but can be spoofed by the client. never trust the leftmost IP for security without validating the chain.

REMOTE_ADDR (TCP source IP) is always the immediate upstream. that’s the only IP you actually verified.

X-Real-IP

a simpler single-value alternative. nginx’s ngx_http_realip_module sets $remote_addr to this value if it trusts the upstream. one value — the proxy decided what the “real” IP is.

Forwarded (RFC 7239)

the standardized version. structured:

Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43;host=example.com

supports for (client), by (proxy receiving), host (original Host header), proto (original protocol). more precise than the X-Forwarded-* family.

Via

Via: 1.1 vegur, 1.1 varnish, 1.1 nginx

added by each proxy in the chain. shows the proxy software and HTTP version used. useful for debugging — you can see who touched your request.

Connection and hop-by-hop headers

Connection: keep-alive and Connection: close are hop-by-hop — they apply to the immediate connection, not end-to-end. each proxy strips them before forwarding. other hop-by-hop headers: TE, Trailers, Transfer-Encoding, Upgrade, Keep-Alive, Proxy-Authorization, Proxy-Authenticate.

HTTP Request Smuggling exploits disagreements between how a front-end proxy and back-end server parse Content-Length vs Transfer-Encoding: chunked. if the proxy strips Transfer-Encoding but the backend processes Content-Length differently, an attacker can "smuggle" a second request inside the first. entire class of high-severity CVEs. fix: keep proxy and backend HTTP parsers in sync.

INTERACTIVE — HEADER PROPAGATION: watch headers mutate through a proxy chain

2 proxies

6. proxy protocols — the plumbing

6.1 HTTP CONNECT tunneling

already covered. used for HTTPS through forward proxies, also used by SSH-over-HTTP, websockets through proxies.

CONNECT target.host:port HTTP/1.1
Host: target.host:port
[optional: Proxy-Authorization: Basic ...]

response: 200 Connection Established — then raw bytes flow.

6.2 PROXY protocol (HAProxy protocol)

not to be confused with HTTP proxying. this is a small header prepended to a TCP connection by a load balancer so the backend knows the original client IP, even though the TCP connection comes from the LB.

PROXY TCP4 192.168.1.5 10.0.0.1 56324 443\r\n
[then: the actual HTTP/TLS bytes start]

v1 is plaintext. v2 is binary. nginx, HAProxy, AWS NLB support it. backend must explicitly opt in to parse it — if you enable it on the LB but not the backend, the backend sees garbage at the start of every connection.

6.3 WebSocket proxying

HTTP upgrades to WebSocket:

GET /socket HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

a naive HTTP proxy drops the Upgrade header (hop-by-hop). a WebSocket-aware proxy must forward it and then switch to bidirectional byte streaming. nginx needs proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade";.

6.4 gRPC proxying

gRPC runs over HTTP/2. HTTP/2 is multiplexed — multiple requests share one TCP connection with stream IDs. a proxy must be HTTP/2-aware. an HTTP/1.1 proxy cannot forward gRPC at all (frames don’t map to HTTP/1.1 messages).

nginx needs grpc_pass grpc://backend:50051; (not proxy_pass). also: gRPC uses trailers (headers after the body) for status codes — the proxy must support HTTP/2 trailers.


7. real-world architectures

7.1 the classic three-tier

Internet → [CDN / edge cache] → [Load Balancer] → [App Servers] → [DB]

each arrow is a reverse proxy of some kind.

7.2 microservices with API gateway

Mobile App ─► API Gateway (Kong/AWS) ─► /auth  → auth-service
                                      ─► /users → user-service
                                      ─► /posts → post-service

the gateway handles JWT validation, rate limiting, routing. services are fully hidden.

7.3 VPN as forward proxy

YOUR LAPTOP ──[encrypted tunnel]──► VPN SERVER ──► internet

the VPN server is a forward proxy. your ISP sees encrypted traffic to the VPN server. websites see the VPN server’s IP. note: the VPN provider sees everything — you’re just moving who you trust.

7.4 nginx as everything simultaneously

nginx can be a forward proxy (with resolver configured), TLS terminator, caching proxy, load balancer, and static file server — all in one process, all configured via nginx.conf. it’s proxy swiss-army-knife.

INTERACTIVE — ARCHITECTURE EXPLORER: click a scenario


8. performance: what proxies cost and save

8.1 costs

  • latency: every proxy adds at least one RTT. a chain of 3 proxies adds ~3× RTT overhead before the first byte reaches the backend.
  • connection overhead: TCP handshakes, TLS handshakes per hop. mitigated by connection pooling (keep-alive).
  • CPU: TLS termination, header parsing, logging, encryption. mitigated by hardware TLS offload (AWS, Cloudflare use custom ASICs).
  • memory: buffering. a proxy receiving a large request body must buffer it if the backend is slow (can cause OOM).

8.2 savings

  • caching: a well-configured caching proxy turns O(N) backend requests into O(1) for hot content.
  • connection multiplexing: HTTP/2 allows one TCP connection to carry many requests in parallel. a proxy can multiplex many HTTP/1.1 connections from clients onto a few HTTP/2 connections to backends.
  • compression: gzip/brotli at the proxy, serving pre-compressed content or compressing on the fly.
LATENCY BUDGET Total latency = client→proxy RTT + proxy processing time + proxy→backend RTT + backend processing time + backend→proxy RTT + proxy→client RTT For a cached response: = client→proxy RTT + cache lookup + proxy→client RTT (backend RTTs → 0)

8.3 the thundering herd problem

a caching proxy under high load: if a cache entry expires, many requests arrive simultaneously for the same resource. all of them miss the cache and hit the backend simultaneously — stampede.

mitigations:

  • lock / coalesce: only one request goes to the backend; others wait for it to populate the cache (Varnish grace mode)
  • stale-while-revalidate: serve stale content while refreshing in the background
  • jitter on TTL: don’t let all entries expire at the same time

9. security considerations

9.1 SSRF — Server-Side Request Forgery

if an application makes outbound requests to user-supplied URLs (e.g., webhook URLs, image fetching), and those requests go through the application server — the application is a de facto forward proxy. attackers use it to reach internal services:

POST /fetch-url
{"url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"}
          ↑ AWS metadata endpoint — internal only, but server can reach it

fix: validate URLs against an allowlist, use a dedicated egress proxy with strict controls.

9.2 proxy authentication bypass

407 Proxy Authentication Required is the proxy equivalent of 401. if the proxy only validates on the first request and then trusts the connection, an attacker who can inject a Connection: close mid-stream can force re-authentication on a new connection.

9.3 open proxy

a forward proxy that accepts requests from anyone on the internet and forwards them anywhere. useful for attackers: they can use your server as a launchpad, and the target sees your IP. never run an open proxy accidentally. check: curl -x yourserver:3128 https://ifconfig.me should fail from outside your network.

Header injection: if a proxy blindly forwards any X-Forwarded-For header from clients without sanitizing it, an attacker can set X-Forwarded-For: 127.0.0.1 and potentially bypass IP-based access controls. always strip client-supplied X-Forwarded-For at the first trusted proxy.

10. quick reference configs

nginx as reverse proxy + load balancer

upstream backend {
    least_conn;
    server 10.0.0.1:8080 weight=3;
    server 10.0.0.2:8080 weight=1;
    keepalive 32;
}

server {
    listen 443 ssl;
    ssl_certificate     /etc/nginx/cert.pem;
    ssl_certificate_key /etc/nginx/key.pem;

    location / {
        proxy_pass http://backend;
        proxy_http_version 1.1;
        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Connection        "";   # enable keepalive to upstream
    }
}

nginx as caching proxy

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m
                 max_size=1g inactive=60m use_temp_path=off;

location /api/ {
    proxy_cache            my_cache;
    proxy_cache_valid      200 1m;
    proxy_cache_use_stale  error timeout updating;
    proxy_cache_lock       on;           # coalesce thundering herd
    add_header X-Cache-Status $upstream_cache_status;
    proxy_pass http://backend;
}

HAProxy TCP load balancer (L4)

frontend ft_https
    bind *:443
    mode tcp
    default_backend bk_https

backend bk_https
    mode tcp
    balance roundrobin
    option tcp-check
    server s1 10.0.0.1:443 check
    server s2 10.0.0.2:443 check

curl through SOCKS5

curl --socks5-hostname proxy.host:1080 https://target.com

11. sources & further reading


GitHub · RSS