bytesize

How Gmail knows your email is taken, instantly

·12 min read

You type an email on the Gmail sign-up page, reach for Tab, and — before your finger lifts — the form says already taken. It doesn't feel like a check. It feels like Gmail already knew.

Pick a scenario below and step through what actually happens. Then we'll zoom in on the parts that earn their keep.

FlowDemo · end-to-end
FlowDemo · end-to-endscenario = brand-new email · step = 1/7
you type the emailGoogle Front EndMaglev + GFE + abuseidentity frontend shardALTS over Stubby · Slicernormalise → canonicalin-process near-cachehot entries, per-sharddistributed cacheMemcache-class (not Redis)Bloom filterin-memory pre-checkSpanner point readprimary-key lookuprespondhitavailablehitavailablenoavailable
1 / 7
··
The browser has the full string zxcvb12345@gmail.com once you pause for ~300 ms. It doesn't fire one request per keystroke.

Three things to notice in the widget. First, the answer can come out of any of several layers — not just the database. Second, the very first thing the server does is rewrite your email into a form you didn't type. Third, whatever the flow above says while you're typing, it's not what decides anything when you actually click Sign up. We'll work through each.

#Your email gets rewritten

Before the accounts table is consulted, your address is lowered, stripped of dots in the local part, and trimmed of anything after a plus. So J.Ohn.Doe+promo@gmail.com becomes johndoe@gmail.com. That canonical string is what every layer below actually checks.

NormalisationMap
NormalisationMapstep = 1/6
johndoe@gmail.comJohnDoe@Gmail.comj.o.h.n.d.o.e@gmail.comjohn.doe+newsletters@gmail.comj.ohn.doe+work@googlemail.comjohndoe@company.comjohndoe@gmail.comCANONICAL FORMjohndoe@company.com
1 / 6
The canonical form itself. Obviously taken.

Google's own help docs confirm this for consumer @gmail.com addresses. It's why j.o.h.n.d.o.e@gmail.com can't register if johndoe@gmail.com already exists. The thing you typed was discarded.

#Two caches before the database

A verycommon question about this flow: what's the cache doing in front of the database? Most real identity systems put at least one — often two — in-memory cache tiers between the request and the authoritative store, and Gmail is no exception.

The first one is an in-process near-cache, right inside the Gaia frontend that handles your request. If the same canonical email was asked about in the last few seconds on the same shard (which happens a lot — popular names get typed constantly), the answer is still in memory. No further work.

The second is a distributed cache that spans many frontends, so a warm answer from one shard can serve another. In the FlowDemo widget above, the someone just checked this name scenario lights this up: the request terminates at the near-cache and never reaches the Bloom filter or Spanner.

#The Bloom filter: a cheap, one-sided answer

When both caches miss, the server asks one more cheap thing before touching the database: a Bloom filter. A row of bits, all zero to start. When an account is created, a few hash functions of the canonical email each pick a bit, and those bits get flipped on. To check an email, hash it the same way and look at those bits: if any one of them is 0, it's definitely not in the set. If they're all 1, it might be.

Step through a handful of inserts and queries below.

HashLane · insert johndoe@gmail.com
HashLane · insert johndoe@gmail.comm = 16 · set = 3 · step = 1/5
h₁h₂h₃insert("johndoe@gmail.com")0123456789101112131415
1 / 5
Start with an empty filter. When an account is created, three hashes of the canonical email each pick a bit, and those bits are flipped on. Nothing about the email itself is stored.

The filter can lie about yes, never about no. A no alone is enough to answer the user — no database round-trip needed. A maybe has to go to Spanner to be sure.

#Three fast paths, one slow one

Put those pieces together. The same request can end at four different places depending on what each layer knows:

  • Near-cache hit — a few microseconds. The fastest path.
  • Distributed-cache hit — a millisecond or two. Still fast.
  • Bloom filter says no — a couple of milliseconds. Saves the database trip entirely.
  • Bloom filter says maybe— point-read on Spanner. Google's published target for Spanner point reads is under 5 ms at the median. The only path that actually talks to the authoritative store.

Scroll back to the top widget and switch between scenarios. The diagram re-routes to match.

#Submit — the check you didn't see

Everything above was the check that runs while you're typing. It's a UX hint, and it's allowed to be wrong, stale, or racing someone else.

When you actually click Sign up, a different thing happens. A database transaction tries to INSERTa new row keyed on your canonical email, against a column with a uniqueness constraint enforced by Spanner itself. If another person's transaction committed first, yours fails with a constraint violation and the server returns EMAIL_EXISTS— the official “someone else already owns this canonical email” signal, which is what the UI renders as already taken. Try it below with the two sliders.

SignupRace
SignupRacet(A) = 20 ms · t(B) = 28 ms · winner: A
time →USER Atypingpre-check: availablesubmitcommitUSER Btypingpre-check: availablesubmitalready takenDATABASEA commits alice@gmail.comB rejected (UNIQUE)0102030405060ms
A submitted first and commits. B's INSERT for alice@gmail.com hits the UNIQUE constraint and returns EMAIL_EXISTS. Both users saw "available" during typing — that check was advisory. The database is the truth.

The winner is whichever INSERT Spanner committed first — not whichever user clicked Submit first in their browser. Both clients saw availablewhile typing; only the database's serial ordering at commit decides who actually got the address.

#When it goes wrong: the Netflix dot-scam

The reason the rewrite step mattered is that other services don't always do it the same way Gmail does. In 2018, an engineer described a scam that worked like this. An attacker signs up for Netflix using a dotted variant of your Gmail — say j.ohn.doe@gmail.com— with a bad card. Netflix treats the dotted version as a new customer, because Netflix doesn't normalise the way Gmail does. The card fails. Netflix sends you a polite email about it — both addresses land in your inbox. You, confused, helpfully pay.

Google knew there was one canonical form; Netflix didn't. That gap is the attack surface. If you're storing emails, normalise on write, put your uniqueness constraint on the normalised column, and keep the raw version only for display.

#Sticking the landing

So: a pause, a trip through Google's edge, a rewrite, two caches, a Bloom filter, maybe a database — and, at Submit, a real transaction against a uniqueness constraint that does the only check that actually matters. Two different questions on two different versions of your email. The fast one is for the UI. The slow one is for the truth.