Proxies & Reverse Proxies

[!note] every packet on the internet touches a middleman. this is about the middlemen.


1. what even is a proxy

a proxy is just a middleman. sits between two parties, forwards traffic. neither side is talking directly to the other.

CLIENT ──► PROXY ──► SERVER
       ◄──────────◄──

why bother? few reasons:

  • hide who you are (client anonymity) or hide where your servers are
  • filter or log traffic
  • cache stuff so the backend doesn’t get hammered
  • spread load across multiple servers
  • terminate TLS in one place

the entire internet runs on proxies btw. CDNs, load balancers, your corporate firewall, API gateways — all proxies. once you see it you can’t unsee it.

the fundamental split is whose side the proxy is on:

  • forward proxy → works for the client. client knows about it.
  • reverse proxy → works for the server. client usually has no idea.

2. forward proxies — the client’s agent

2.1 what it does

acts on behalf of clients. the client explicitly sends requests through it. the server sees the proxy’s IP, not the client’s.

ALICE ──►
BOB  ──► FORWARD PROXY ──► google.com
CAROL──►                   (google sees proxy IP, not alice/bob/carol)

the client side configures this — browser settings, HTTP_PROXY env var, etc.

2.2 how HTTPS works through it (the CONNECT method)

for plain HTTP the proxy sees everything. for HTTPS, the browser sends a special CONNECT request:

CONNECT api.github.com:443 HTTP/1.1
Host: api.github.com:443

proxy opens a TCP connection to github, replies 200 Connection Established, and then just passes bytes through blindly. can’t read the encrypted traffic, but it does see the hostname (SNI).

2.3 the subtypes

transparent proxy

client has no idea. network silently redirects traffic at the router/iptables level. used by ISPs, corporate networks, parental controls, captive portals.

[!note] “transparent” here means transparent to the client — invisible. not transparent as in “you can see through it.” confusing name.

anonymous vs elite proxies

this is a spectrum based on what headers the proxy sends:

type sends X-Forwarded-For? server sees real IP?
transparent yes effectively yes
anonymous yes not in TCP, but in header
elite (high-anonymity) no no
distorting yes, but fake IP no

elite proxy = server literally cannot tell you’re behind a proxy.

SOCKS5

operates at layer 5 — doesn’t care about the protocol at all. forwards raw TCP and UDP. useful for:

  • SSH tunnels
  • VPN-like setups
  • anything that’s not HTTP
curl --socks5-hostname proxy.host:1080 https://target.com

the key difference from HTTP proxies: SOCKS5 supports UDP (so DNS, VoIP etc. work) and IPv6.

[!warning] if you use --socks5 instead of --socks5-hostname, DNS resolution happens on your machine, not through the proxy — your real location leaks via DNS. always use --socks5-hostname.

web proxy / HTTP proxy

the classic. Squid is the main one. understands HTTP, can cache responses, filter content, log everything, modify headers. enterprises and schools love these.

DNS proxy

intercepts DNS queries before they reach upstream. Pi-hole does this — you send a query for ads.doubleclick.net, Pi-hole returns 0.0.0.0, ad never loads. also used for split-horizon DNS (internal vs external resolution).


3. reverse proxies — the server’s agent

3.1 what it does

acts on behalf of servers. clients think they’re talking directly to the server — the reverse proxy’s address is what DNS points to. actual backends are hidden.

CLIENT ──► REVERSE PROXY ──► server-a
(only sees proxy)           ├─► server-b
                            └─► server-c

the server side configures this. clients usually have no clue it’s there.

what’s hidden from the client: number of backends, their IPs, their software, backend failures (proxy can retry transparently).

3.2 load balancer

distributes requests across multiple backends. the core problem: which backend gets this request?

algorithms:

algorithm how it picks good for
round robin next in rotation uniform requests
weighted round robin next, weighted by server capacity mixed hardware
least connections backend with fewest active connections long-lived connections
IP hash hash(client_ip) % N session stickiness
random random surprisingly works fine
least response time fastest backend wins latency-sensitive APIs

L4 vs L7 — this confused me for a while:

  • L4: routes by TCP/UDP info only (IP + port). fast, can’t inspect HTTP. AWS NLB, HAProxy in TCP mode.
  • L7: routes by HTTP content (URL path, headers, cookies). smarter — can do A/B routing, canary deploys. nginx, AWS ALB.

one thing that tripped me up: IP hash stickiness breaks when you add/remove servers because hash(ip) % N changes for nearly every client. the fix is consistent hashing — put servers on a ring, hash the client IP, go clockwise to find the server. adding/removing a server only migrates ~1/N of clients instead of almost all of them.

3.3 API gateway

a reverse proxy specifically for API traffic. adds:

  • JWT verification / API key validation
  • rate limiting (token bucket, leaky bucket)
  • routing by path/version (/v1 → old service, /v2 → new service)
  • request/response transformation
  • logging and metrics per endpoint
CLIENT ──► API GATEWAY
             ├── verify JWT
             ├── check rate limit (100 req/min via redis)
             ├── route /v2/users → users-service:8080
             └── strip internal headers, inject X-Request-ID

Kong, AWS API Gateway, Traefik, Envoy are the common ones.

3.4 TLS termination proxy

handles the TLS handshake so backends don’t have to. client→proxy is encrypted, proxy→backend is plain HTTP (on a private network).

why do this?

  • backends don’t need certs
  • crypto is CPU-intensive — dedicated hardware/proxy handles it
  • one place to enforce cipher suites, TLS version minimums
  • centralized cert management (Let’s Encrypt automation)
CLIENT ──[HTTPS]──► NGINX (terminates TLS) ──[HTTP]──► backend:8080

3.5 caching reverse proxy

stores responses, serves them without hitting the backend. Varnish is the classic. nginx has proxy_cache. Cloudflare does this at scale.

cache key = usually method + host + path + some headers.

the hard part is cache invalidation. main strategies:

  • TTL-based: cache for N seconds, re-fetch after
  • surrogate keys / cache tags: tag responses, invalidate by tag
  • purge API: explicit PURGE /path

thundering herd problem — if a cache entry expires under high load, all requests simultaneously miss and hammer the backend. fixes:

  • lock: only one request fetches, others wait (Varnish grace mode)
  • stale-while-revalidate: serve the stale version while refreshing in background
  • TTL jitter: randomize expiry so everything doesn’t expire at once

3.6 ingress controller (kubernetes)

just a programmable reverse proxy that watches the k8s API and reconfigures itself when you apply Ingress resources.

- host: api.myapp.com
  http:
    paths:
    - path: /v2
      backend: service-v2:80
# nginx ingress reads this and updates its upstream config automatically

nginx ingress, Traefik, Envoy/Istio are the main ones.

3.7 service mesh sidecar

the extreme version. every pod gets an Envoy sidecar injected into it. all traffic in and out flows through the sidecar. control plane (istiod in Istio) configures all sidecars centrally.

every service is simultaneously a reverse proxy for incoming traffic and a forward proxy for outgoing traffic. mTLS everywhere, zero-trust networking, distributed tracing — all handled by the mesh, not the application code.

honestly overkill unless you’re running a lot of microservices and really need zero-trust between them.


4. the full picture

type layer client knows? hides use case
transparent proxy 7 no nothing ISP filtering, captive portals
anonymous proxy 7 yes client IP privacy browsing
elite proxy 7 yes client IP + proxy presence scraping, security research
SOCKS5 5 yes client IP tunneling, VPNs
DNS proxy DNS no N/A ad blocking, split DNS
load balancer 4/7 no backend pool scalability
API gateway 7 no backend services auth, rate limits, routing
TLS termination 7 no backend topology cert management
caching proxy 7 no backend load CDN, performance
ingress controller 7 no k8s internal routing kubernetes
service mesh 7 no everything microservices zero-trust

5. headers — how proxies talk to each other

X-Forwarded-For

X-Forwarded-For: client, proxy1, proxy2

each proxy appends the IP it received from. leftmost is supposedly the original client — but clients can spoof this. REMOTE_ADDR (TCP source IP) is the only IP you actually verified. never trust leftmost X-FF for security without validating the whole chain.

X-Real-IP

simpler single-value alternative. nginx’s realip module can use this to set $remote_addr if it trusts the upstream.

Forwarded (RFC 7239)

the standardized structured version:

Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43;host=example.com

more precise than X-Forwarded-* but less commonly seen in practice.

Via

Via: 1.1 vegur, 1.1 varnish, 1.1 nginx

each proxy adds itself. useful for debugging — you can see exactly who touched your request.

hop-by-hop headers

Connection: keep-alive, Transfer-Encoding, Upgrade etc. apply only to the immediate connection, not end-to-end. each proxy strips them before forwarding.

[!note] HTTP Request Smuggling exploits disagreements between how a front-end proxy and back-end server parse Content-Length vs Transfer-Encoding: chunked. if they disagree, an attacker can “smuggle” a second request inside the first. whole class of high-severity CVEs. fix: keep proxy and backend HTTP parsers in sync. PortSwigger has good material on this.


6. proxy protocols — the plumbing

HTTP CONNECT tunneling

already covered above. also used by SSH-over-HTTP and WebSockets through proxies.

PROXY protocol (HAProxy protocol)

not HTTP proxying — this is a tiny header prepended to a TCP connection by a load balancer so the backend knows the real client IP even though TCP comes from the LB.

PROXY TCP4 192.168.1.5 10.0.0.1 56324 443\r\n
[then: actual HTTP/TLS bytes]

v1 is plaintext, v2 is binary. backend must explicitly parse it — if you enable it on the LB but forget the backend, the backend sees garbage at the start of every connection (learned this the hard way).

WebSocket proxying

HTTP upgrades to WebSocket via the Upgrade header — which is hop-by-hop. a naive HTTP proxy drops it. nginx needs:

proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

gRPC proxying

gRPC runs over HTTP/2. an HTTP/1.1 proxy cannot forward gRPC at all — the frames don’t map. nginx needs grpc_pass not proxy_pass. also: gRPC uses HTTP/2 trailers for status codes, so the proxy needs to support those too.


7. real-world architectures

classic three-tier

internet → [CDN/edge cache] → [load balancer] → [app servers] → [DB]

every arrow is a reverse proxy of some kind.

microservices with API gateway

mobile app → API gateway → /auth  → auth-service
                         → /users → user-service
                         → /posts → post-service

gateway handles JWT, rate limiting, routing. services are completely hidden.

VPN as forward proxy

your laptop ──[encrypted tunnel]──► VPN server ──► internet

your ISP sees encrypted traffic to the VPN server. websites see the VPN IP. but: the VPN provider sees everything — you’re just moving who you trust, not eliminating the trust requirement.

nginx as everything simultaneously

nginx can be a forward proxy, TLS terminator, caching proxy, load balancer, and static file server all in one process. it’s ridiculous how much you can do with nginx.conf.


8. performance notes

what proxies cost

  • latency: every proxy hop adds at least one RTT. chain of 3 proxies = ~3x RTT overhead before the first byte hits the backend.
  • connection overhead: TCP + TLS handshakes per hop. mitigated by connection pooling (keep-alive to upstream).
  • CPU: TLS termination, header parsing, logging. mitigated by hardware acceleration.
  • memory: buffering large request bodies can OOM a proxy if the backend is slow.

what proxies save

  • caching: hot content served without touching the backend at all
  • connection multiplexing: proxy can multiplex many HTTP/1.1 client connections onto a few HTTP/2 connections to backends
  • compression: gzip/brotli at the proxy edge

latency math:

total = client→proxy RTT + proxy processing + proxy→backend RTT 
      + backend processing + backend→proxy RTT + proxy→client RTT

cached response = client→proxy RTT + cache lookup + proxy→client RTT
                  (backend RTTs → 0)

9. security stuff to know

SSRF — Server-Side Request Forgery

if an app fetches user-supplied URLs (webhooks, image URLs), the app server itself is acting as a forward proxy. attackers use this to reach internal services:

POST /fetch-url
{"url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"}
          ↑ AWS metadata endpoint — internal only, but the server can reach it

fix: validate URLs against an allowlist, use a dedicated egress proxy with strict controls.

open proxy

a forward proxy that accepts requests from anyone on the internet and forwards them anywhere. accidentally running one is bad — attackers use your server as a launchpad. check: curl -x yourserver:3128 https://ifconfig.me should fail from outside your network.

header injection

if a proxy blindly forwards X-Forwarded-For from clients without sanitizing it, an attacker can set X-Forwarded-For: 127.0.0.1 and potentially bypass IP-based access controls. always strip client-supplied X-Forwarded-For at the first trusted proxy.


10. quick config reference

nginx reverse proxy + load balancer

upstream backend {
    least_conn;
    server 10.0.0.1:8080 weight=3;
    server 10.0.0.2:8080 weight=1;
    keepalive 32;
}

server {
    listen 443 ssl;
    ssl_certificate     /etc/nginx/cert.pem;
    ssl_certificate_key /etc/nginx/key.pem;

    location / {
        proxy_pass http://backend;
        proxy_http_version 1.1;
        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Connection        "";   # keepalive to upstream
    }
}

nginx caching proxy

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m
                 max_size=1g inactive=60m use_temp_path=off;

location /api/ {
    proxy_cache            my_cache;
    proxy_cache_valid      200 1m;
    proxy_cache_use_stale  error timeout updating;
    proxy_cache_lock       on;       # coalesces thundering herd
    add_header X-Cache-Status $upstream_cache_status;
    proxy_pass http://backend;
}

HAProxy L4 load balancer

frontend ft_https
    bind *:443
    mode tcp
    default_backend bk_https

backend bk_https
    mode tcp
    balance roundrobin
    option tcp-check
    server s1 10.0.0.1:443 check
    server s2 10.0.0.2:443 check

sources


GitHub · RSS