Google Maps Internals

how does google maps find a route in under a second across a graph with 500 million nodes?

the question that started it all
1: Dijkstra, 1956
2: A*, 1968
3: the problem with A* on real road maps
4: Bidirectional Dijkstra
5: Preprocessing enters the picture
6: Contraction hierarchies, 2008
7: ALT — a* with landmarks and triangle inequality
8: Google enters, 2004
9: the routing stack
10: traffic — the hardest part
11: turn restrictions and real-world complexity
12: multi-modal routing
13: map rendering pipeline
14: what’s next / current research
my takeaways
references / further reading

the question that started it all

okay so i was thinking about this the other day — when you type “from NYC to LA” in maps and hit enter, it doesn’t take 10 minutes. it’s basically instant. but the road network of just the USA alone has like 24 million intersections and 58 million road segments. that’s a massive graph.

naively applying Dijkstra on that would be absurdly slow. so how does it actually work?

turns out the answer is a 70-year journey of graph theory, and Google didn’t invent most of it — they just had the engineering muscle to implement ideas from academic papers and run them at planet scale.

1: Dijkstra, 1956

Edsger Dijkstra was 26, sitting in a café in Amsterdam, and in 20 minutes designed the algorithm that would eventually power every navigation app.

the problem is simple: given a weighted graph $G = (V, E, w)$ where $w: E \to \mathbb{R}^+$, find the shortest path from source $s$ to target $t$.

Dijkstra’s insight: maintain a priority queue of “tentative distances” and always expand the closest unvisited node.

dist[s] = 0
dist[v] = ∞  for all v ≠ s

PQ = min-heap of (dist, node)
push (0, s) into PQ

while PQ not empty:
    (d, u) = PQ.pop()
    if d > dist[u]: continue   # stale entry
    for each neighbor v of u:
        if dist[u] + w(u,v) < dist[v]:
            dist[v] = dist[u] + w(u,v)
            PQ.push(dist[v], v)

complexity: $O((V + E) \log V)$ with a binary heap, $O(V \log V + E)$ with a Fibonacci heap.

but here’s the thing — even $O((V + E) \log V)$ on 500M nodes is way too slow for a real-time response. Dijkstra explores in all directions uniformly, like a ripple spreading outward from the source. most of that exploration is useless.

Dijkstra published this in 1959. he later said he designed it “without pencil and paper” and the 20-minute café session produced it nearly complete. wild.

2: A*, 1968

Peter Hart, nils nilsson, and bertram raphael at stanford research institute had a smarter idea: what if instead of exploring uniformly, we guide the search toward the destination?

the key insight: if we have a heuristic $h(v)$ that estimates the remaining distance from node $v$ to target $t$, we can prioritize nodes that look promising.

\[f(v) = g(v) + h(v)\]

where:

$g(v)$ = actual known distance from source to $v$
$h(v)$ = estimated distance from $v$ to target
$f(v)$ = estimated total path cost through $v$

A* is just Dijkstra but instead of sorting the priority queue by $g(v)$, you sort by $f(v) = g(v) + h(v)$.

the critical requirement: $h$ must be admissible — it must never overestimate the true distance.

\[h(v) \leq d(v, t) \quad \text{for all } v\]

for road networks: Euclidean distance (straight-line distance) is always admissible because roads can’t be shorter than straight lines.

\[h(v) = \sqrt{(x_v - x_t)^2 + (y_v - y_t)^2}\]

if $h$ is also consistent (monotone), meaning:

\[h(u) \leq w(u, v) + h(v) \quad \text{for all edges } (u,v)\]

then A* is optimal and never reopens nodes. Euclidean distance satisfies this too (triangle inequality).

why it’s faster: A* expands nodes that are both close to the source and close to the target. it naturally avoids going the wrong direction. in practice on real road maps it can be 10x faster than Dijkstra.

but even A* wasn’t fast enough for Google-scale routing. it still explores too much.

3: the problem with A* on real road maps

here’s the thing — A* with Euclidean heuristic works great on a uniform grid. but real road maps are not uniform. there are highways that let you cover 100km in an hour, and tiny residential streets that are 1km/hr-equivalent for detour purposes.

the heuristic quality matters enormously. a “tight” heuristic (one that’s close to the true distance) leads to almost no unnecessary exploration. a “loose” heuristic is basically Dijkstra.

on a real road network, the Euclidean heuristic is often pretty loose because highways exist — you might drive away from the destination initially to get on a highway that takes you there much faster. A* doesn’t know about highways.

so researchers started thinking: how do we make the heuristic smarter?

4: Bidirectional Dijkstra

another idea from the 70s: run Dijkstra from both ends simultaneously.

start a forward search from $s$ and a backward search from $t$. expand one node from each alternately (or by comparing minimum tentative distances). stop when they “meet in the middle.”

the intuition: if the shortest path has length $d$, forward search only needs to cover distance $d/2$, not $d$. since dijkstra expands circles of increasing radius, and circle area is $\pi r^2$, covering half the radius means $\pi(d/2)^2$ nodes instead of $\pi d^2$ — four times fewer.

but the termination condition is subtle. a naive “stop when the forward and backward frontiers overlap” is wrong. the correct termination:

stop when a node $u$ is settled (popped from PQ) by both searches. but the shortest path might not go through the first such node — you need to check the minimum over all nodes $v$ where: $\mu = \min_{v \in V} (d_f(v) + d_b(v))$ where $d_f(v)$ is forward distance and $d_b(v)$ is backward distance.

actually even this is non-trivial. the stopping criterion that works: stop when $\min(PQ_f) + \min(PQ_b) \geq \mu$, where $\mu$ is the best path found so far.

bidirectional search is another ~4x speedup but still not enough for real-time continental routing.

5: Preprocessing enters the picture

here’s the key paradigm shift. up to this point, all algorithms were online — they work directly on the raw road graph with no preprocessing.

but what if we’re okay doing heavy offline preprocessing, as long as queries become blazing fast?

navigation systems in the 90s (car GPS units like TomTom) started doing this. the road network doesn’t change that often. so precompute something that makes queries faster.

the first serious idea: highway hierarchies (Sanders & Schultes, 2005). the observation is that long-distance routes almost always use highways/motorways. local roads only matter near source and destination.

they explicitly hierarchically decomposed the network. lower levels = local roads, higher levels = highways. for long-distance queries, quickly reach the highway network, traverse it at a coarser level, drop back to local roads at the end.

this worked but was complex to implement correctly. then came the elegant solution.

6: Contraction hierarchies, 2008

this is the big one. geisberger, sanders, schultes, and delling published “Contraction Hierarchies: Faster and Simpler Hierarchical Routing in Road Networks” in 2008. this is what most production routing engines (including Google’s predecessors and OSRM, Valhalla, etc.) are built on.

the key insight is beautiful: create shortcuts.

the preprocessing phase

assign each node an “importance” ranking (how important is this intersection to long-distance routes?). then, in order of increasing importance, contract each node:

contraction: remove node $v$ from the graph, but for every pair of edges $(u, v)$ and $(v, w)$, if the path $u \to v \to w$ is the shortest path from $u$ to $w$ in the remaining graph, add a shortcut edge $u \to w$ with $w(u,w) = w(u,v) + w(v,w)$.

why? because when we later query and encounter $u$ and $w$ but $v$ has been contracted away, the shortcut tells us we can go directly without expanding $v$.

this builds a layered version of the graph where shortcuts encode information about contracted nodes.

importance/ordering heuristics

how to decide which nodes to contract first? use metrics like:

edge difference: how many shortcuts added minus edges removed
deleted neighbors: number of already-contracted adjacent nodes
uniformity: prefer nodes that make the graph contract uniformly

the heuristic: $\text{importance}(v) = \text{edge_diff}(v) + \text{deleted_neighbors}(v) + \text{shortcut_cover}(v)$

the query phase

query on the contracted graph using bidirectional Dijkstra with a critical constraint:

forward search from $s$: only traverse edges that go upward in the hierarchy (to higher-importance nodes)
backward search from $t$: only traverse edges that go upward in the hierarchy

both searches move “up” the hierarchy, meet near the top, and the combination gives the shortest path.

why is this correct? any shortest path must eventually “peak” at some high-importance node. by only going upward from both ends, we guarantee finding that peak.

the speedup is insane. on European road networks:

Dijkstra: ~5 million node relaxations per query
A*: ~1 million
Bidirectional A*: ~300k
CH: ~2000 node relaxations!

that’s a 2500x speedup. now you can answer a query across all of Europe in microseconds.

7: ALT — a* with landmarks and triangle inequality

around the same time (goldberg & harrelson, 2005), another approach: ALT algorithm.

the idea: precompute exact distances from $k$ “landmark” nodes $L$ to all other nodes. then use the triangle inequality to get a tighter heuristic.

for a target $t$ and landmark $\ell$:

\[|d(v,t) - d(\ell, t)| \leq d(v, \ell)\]

rearranging: $d(v,t) \geq d(\ell,t) - d(v,\ell)$

so the heuristic:

\[h(v) = \max_{\ell \in L} \max\bigl(d(\ell,t) - d(v,\ell),\; d(v,\ell) - d(\ell,t)\bigr)\]

this is always admissible (by triangle inequality) and much tighter than Euclidean distance. good landmark placement (near the “border” in the direction of the target) gives almost perfect heuristics.

ALT with ~16 landmarks gets enormous speedups. but it requires $O(kV)$ preprocessing space and time. for the full US road graph that’s big but manageable.

REAL-WORLD NOTE: Google likely uses a combination of these. CH + ALT hybrid (called CH+ALT or CALT) is common in practice.

8: Google enters, 2004

okay, so all of this is the theoretical backstory. Google’s story:

2004: google acquires Where 2 Technologies, a Sydney startup building a C++ map client. this becomes Google Maps.

the original team: lars rasmussen, jens rasmussen (brothers), noel gordon, stephen ma. 4 people who built the initial system in C++.

february 2005: google maps launches publicly. it’s the first major map service to use tiled images loaded dynamically — before this, MapQuest would just give you one big image. google tiles the map into a grid and loads only what’s visible. this changed everything about web maps.

2005-2006: the infrastructure build-out. google had to:

ingest the road network data (licensed from Navteq/TeleAtlas)
build a routing graph from raw geometry data
geocode billions of addresses
build the tile rendering pipeline

the routing graph construction itself is non-trivial. raw geometry data has duplicates, gaps, inconsistencies, different coordinate systems. cleaning this is months of work.

9: the routing stack

Google Maps routing is not just one algorithm. it’s a pipeline:

1. graph construction

raw road data → clean graph
node = intersection, edge = road segment
edge weights = travel time (not just distance!)
travel time depends on: speed limit, road type, time of day, real-time traffic

2. preprocessing

contraction hierarchies computed on the static graph
this takes hours but only needs to be redone when the graph changes significantly
the contracted graph is stored and used for all queries

3. real-time traffic layer

google collects location data from android phones (with consent)
billions of data points per day: “at this location, at this time, speed was X”
aggregate into per-segment speed distributions
update edge weights dynamically
the CH preprocessing is done on “typical” weights, with traffic as an overlay

4. query serving

bidirectional CH query on the preprocessed graph
apply traffic overlay as a correction
takes ~1-5 milliseconds even for cross-continental routes

5. eta estimation

not just routing — also predicting arrival time
historical traffic patterns + current conditions + ML models
the ETA is often more important than the exact route

10: traffic — the hardest part

routing the graph is actually the easy part. the hard part is traffic.

the crowdsourcing insight (2009): google realized they had a massive data source — android users. if you aggregate anonymous speed data from millions of phones, you get real-time traffic information for free.

apple, TomTom, etc. all started doing the same thing. waze (acquired by google in 2013) was the most aggressive — waze users actively report accidents, speed traps, road hazards.

the ML pipeline:

for each road segment, you want to predict: “if I drive here at 5:30pm on a friday, how fast will I go?”

features:

day of week, time of day
weather (rain slows traffic significantly)
nearby events (concerts, games)
historical baseline for this segment
current sensor data

this is basically a time-series regression problem per road segment, multiplied by 500 million segments. the scale is absurd.

temporal weighting: more recent observations get higher weight. there’s a recency decay:

\[\hat{v}(t) = \frac{\sum_i v_i \cdot e^{-\lambda(t - t_i)}}{\sum_i e^{-\lambda(t - t_i)}}\]

where $v_i$ is the observed speed, $t_i$ is the time, and $\lambda$ controls how quickly old observations fade.

11: turn restrictions and real-world complexity

the abstract graph model has issues with real roads:

turn restrictions: “no left turn at 5th and broadway between 7-9am”. this can’t be modeled just on nodes and edges.

solution: edge-based graph (also called the “line graph”). instead of nodes = intersections, you model:

nodes = directed road segments
edge from $a \to b$ if you can legally drive from segment $a$ to segment $b$ without violating a turn restriction

this doubles the graph size but correctly handles turn restrictions. CH works on this too.

ferry routes, tunnels, unpaved roads: different “modes” with different cost functions.

one-way streets: the directed graph handles these naturally.

road closures: real-time updates that must invalidate parts of the precomputed hierarchy. Google handles this by applying closures as infinite-cost edges on top of the precomputed CH.

the “get directions” problem gets way more complex when you add transit.

transit adds:

time-dependent edges: the bus arrives at 3:15pm, not continuously
transfer time: time to walk between stops
frequency: how often does the bus run?

the algorithm used: RAPTOR (Round-Based Public Transit Optimized Router, Delling et al. 2012).

not going to go deep here but the key insight: model transit as “rounds”. each round = one additional transit vehicle taken. BFS-style expansion round by round, which handles multi-hop journeys elegantly.

multi-modal = combine road graph (driving/walking) with transit graph, with time-dependent connections between them.

13: map rendering pipeline

separate from routing, the tile rendering is also fascinating.

the mercator projection: google maps uses Web Mercator (EPSG:3857). it’s a cylindrical conformal projection — angles are preserved (critical for navigation), but area is distorted (Greenland looks bigger than Africa).

$x = R \cdot \lambda$ $y = R \cdot \ln\left(\tan\left(\frac{\pi}{4} + \frac{\phi}{2}\right)\right)$

where $\lambda$ is longitude, $\phi$ is latitude, $R$ is Earth’s radius.

tiling: the world is divided into a quadtree of tiles. at zoom level $z$, there are $4^z$ tiles. each tile is 256×256 pixels.

zoom 0: 1 tile, whole world
zoom 10: ~1M tiles, city level
zoom 20: ~1 trillion tiles, building level

only tiles in the viewport are loaded. google pre-renders and caches tiles at every zoom level. vector tiles (since 2013) send the raw geometry and render on the client, enabling smoother zoom and smaller downloads.

14: what’s next / current research

machine learning for routing:

learned heuristics for A* that are better than Euclidean distance
graph neural networks on road networks
deep RL for routing in dynamic environments

uncertainty-aware routing:

instead of a single fastest route, compute routes that are robust to traffic variability
“stochastic routing” — minimize expected travel time and variance

personalized routing:

some people prefer highways, some prefer scenic routes
preference learning from past route choices

3D routing:

google maps now shows 3D buildings, traffic in 3D
routing considering elevation (important for cycling, walking)

my takeaways

what strikes me most about this whole story:

the algorithm (CH) that powers routing was published in 2008 — only 3 years after Google Maps launched. the first years of Google Maps probably used something simpler.
the hard part is not routing, it’s traffic. a perfect routing algorithm with wrong edge weights gives wrong answers.
data is the moat. the reason Google Maps is better than competitors isn’t a secret algorithm — it’s 15 years of location data from Android phones.
the theory-to-practice gap is huge. dijkstra invented the algorithm in 20 minutes in a café. it took 50 years of engineering + database + distributed systems work to make it run at planetary scale in real time.
CH is genuinely elegant. the idea of “contract unimportant nodes, add shortcuts, then only search upward in the hierarchy” is one of those things that seems obvious in retrospect but took decades to discover.

references / further reading

dijkstra, e.w. (1959). a note on two problems in connexion with graphs. numerische mathematik.
hart, nilsson, raphael (1968). a formal basis for the heuristic determination of minimum cost paths. ieee trans. syst. sci. cybern.
goldberg & harrelson (2005). computing the shortest path: A meets graph theory*. SODA 2005.
geisberger et al. (2008). contraction hierarchies: faster and simpler hierarchical routing in road networks. WEA 2008.
delling et al. (2012). RAPTOR: round-based public transit optimized router. TRR 2012.
bauer et al. (2010). sharc: fast and robust unidirectional routing. JEA 2010.

stuff i want to read more:

the original CH paper (geisberger 2008) — it’s only 14 pages
how OSM (openstreetmap) data quality compares to google’s licensed data
the exact traffic aggregation pipeline — probably a netflix-style engineering blog somewhere
how apple maps improved so dramatically from the 2012 disaster