ⓘ Hover over any item to see detailed technical information

🗄️
Core Storage Model
How messages are physically organized
Flat Message Store
No folder hierarchy exists
Flat Message Store
All Gmail messages are stored in a single, flat content-addressed store. There is no nested folder structure in the underlying database. Every message receives a unique immutable Message-ID at ingestion.
  • No physical folders exist
  • Messages are never "moved"
  • Storage is per-user object store
  • Content-addressed, not path-addressed
Unique Message-ID
Immutable primary key per message
Unique Message-ID
Each message has a unique ID assigned at ingestion. This ID never changes regardless of labels applied, moved, archived, or starred. The Gmail API exposes this as id in the message resource.
  • Assigned at delivery time
  • Immutable throughout lifecycle
  • Used for deduplication
  • Different from RFC 2822 Message-ID header
Thread Grouping
Conversation threading algorithm
Thread Grouping
Gmail groups messages into threads using a proprietary algorithm that evaluates Subject headers, In-Reply-To, References headers, and time windows. Threading is a view-layer construct, not a storage grouping.
  • Based on RFC headers + heuristics
  • Thread-ID is a derived attribute
  • Re-computed on view, not stored as a structure
  • Thread-ID exposed in Gmail API
🏷️
Label System
The metadata-tag model replacing folders
Labels as Metadata Tags
Many-to-one relationship
Labels as Metadata Tags
Labels in Gmail are metadata tags applied to messages — not containers. A message can have multiple labels simultaneously, which is structurally impossible in a true folder model. "Moving" to a folder simply swaps labels.
  • Each label = a string ID on the message record
  • One message, unlimited labels
  • No message is stored inside a label
  • Surfaced via Gmail API labelIds array
System Labels
INBOX, SENT, TRASH, SPAM, STARRED
System Labels
Inbox, Sent, Drafts, Trash, Spam, Starred, Important, and Unread are all system labels — they behave identically to user labels at the storage layer. Archiving a message simply removes the INBOX label.
  • Cannot be deleted by users
  • Pre-defined label IDs
  • Archiving = remove INBOX label only
  • TRASH adds label + sets expiry
User Labels
Custom folders are just user-defined labels
User-Created Labels
When a user creates a "folder" in Gmail, they are creating a user label. Nested labels (Project/Subproject) are purely a display convention — a slash in the label name renders as a nested folder in the UI but is stored as a single flat string.
  • Slash-delimited for nesting display only
  • Stored as a single label string
  • Deletable — removes label, not messages
  • Up to 500 user labels per account
⚙️
Filters & Routing
Automated label application logic
Filter Rules
Criteria-based label assignment
Filter Rules
Gmail filters are server-side rules that evaluate incoming messages against criteria (From, To, Subject, has:attachment, size) and take actions (apply label, skip inbox, delete, star, mark read, forward).
  • Evaluated at delivery time
  • Can apply multiple label actions
  • Cannot reorder or chain in complex logic
  • Stored as account settings, not message data
Importance Signals
ML-derived IMPORTANT label
Importance Signals
The IMPORTANT label is applied by Google's machine learning model based on user behavior signals: who the user emails frequently, which messages they open and reply to quickly, and account-wide importance patterns.
  • Derived from behavioral signals
  • Not a user-created label
  • Can be overridden manually
  • Feeds Priority Inbox ordering
🔍
Search & Indexing
Full-text and metadata search layer
Full-Text Index
Inverted index over message content
Full-Text Index
Gmail maintains an inverted search index over message bodies, subjects, sender names, and recipient lists. Search is the primary navigation paradigm — it complements the label system rather than depending on folder traversal.
  • Near real-time indexing after delivery
  • Supports Gmail search operators
  • Search does not traverse folder paths
  • Attachment content (PDF, DOCX) also indexed
Search Operators
Metadata predicates on flat store
Search Operators
Gmail search operators (from:, to:, label:, has:, in:, is:, after:, before:) are predicates applied against the flat message store and its metadata index. in:inbox is equivalent to label:INBOX.
  • in:inbox = has INBOX label
  • in:anywhere = all labels including trash
  • label: queries label metadata, not folders
  • Vault adds additional custodian/date scoping
📎
Attachments & MIME
How message parts and attachments are stored
MIME Parts as Message Children
Hierarchical within message boundary
MIME Parts Structure
Each message's MIME structure is stored as a tree of message parts. The Gmail API exposes this via the payload.parts structure. Each part has its own MIME type, headers, and optionally body data or nested parts.
  • Multipart/mixed for messages + attachments
  • Multipart/alternative for text + HTML versions
  • Part IDs assigned sequentially (1, 2, 1.1, 1.2)
  • Bodies can be inline or attachment disposition
Attachment Storage
Blob store, separate from message body
Attachment Storage
Large attachments are stored in a separate blob store referenced by attachment IDs. The Gmail API retrieves them via the attachments.get endpoint with the attachment ID. Quota is counted against the account's Google storage allocation.
  • Stored in Google's object store infrastructure
  • Counted against 15 GB shared quota
  • Deduplicated within account scope
  • Retrieved via separate attachment API call
🏛️
Vault & Retention
Compliance and eDiscovery layer
Retention Policies
Time-based label lifecycle rules
Vault Retention Policies
Google Vault retention rules apply to the underlying message store — they operate independently of the user's label state. A message in Trash can still be retained if it matches a Vault retention rule, even after the user "deletes" it.
  • Operates on message store, not labels
  • Survives user deletion during hold
  • Can target by OU, date range, search query
  • Expiry = physical deletion from store
Legal Holds
Suspend expiry on matched messages
Legal Holds
A Vault hold places an indefinite preservation marker on messages matching the hold scope. This overrides any retention policy and prevents deletion regardless of user action. Holds do not affect the user's view of their mailbox.
  • Invisible to end users
  • Overrides all retention expiry
  • Survives user delete, filter, auto-purge
  • Released only by Vault admin action
Message Ingestion Flow — Inbound Email
SMTP Inbound
MX receives message
Spam / AV
Content scanning
Message Store
Flat object written
Filter Eval
Rules applied
Label Tags
INBOX + others
Index Update
Search index written
Vault Check
Hold/retention eval
Legend
Core storage layer
Label metadata system
Filter and routing rules
Search and indexing
Attachments and MIME
Vault and compliance

Architecture Documentation Notes

Key Insight: No Folders Exist

Gmail has no folder data structure at the storage layer. What users perceive as folders are label tags stored as metadata on each message object. This means a message can be "in" multiple places simultaneously.

Migration Implication

When migrating from Gmail via tools like Google Workspace Migration or GAMME, label metadata must be captured and mapped explicitly. A naive IMAP migration flattens label data into folder structure and loses multi-label assignments.

API Access Model

The Gmail API (v1) surfaces messages as objects with a labelIds array. List operations use label filters as query parameters. All operations are message-centric, not folder-centric. Vault exports preserve label metadata in JSON format.

Source Confidence

Label-as-metadata model: documented in Gmail API and Vault API official documentation. Underlying store architecture (Bigtable/Colossus): described at high level in Google infrastructure whitepapers; exact schemas are not publicly disclosed.