Unique ID Generation (Snowflake)

Introduction

Generating unique, sortable IDs at scale is a common requirement for distributed systems. Twitter's Snowflake is a well-known solution that enables decentralized, high-throughput ID generation.

Watch Video

Problem Statement

How can we generate unique, time-ordered IDs across multiple servers without central coordination, while ensuring high availability and performance?

System Requirements

IDs must be unique and sortable by time.
The system should support high throughput and low latency.
No single point of failure.
Should work across multiple data centers.

High-Level Design

Snowflake IDs are 64-bit integers composed of:

Timestamp (41 bits)
Data center ID (5 bits)
Machine ID (5 bits)
Sequence number (12 bits)

Each node generates IDs independently using its own machine and data center IDs, incrementing the sequence for each request within the same millisecond.

Key Components

ID Generator Service: Runs on each node, responsible for generating IDs.
Clock Synchronization: Ensures system clocks are accurate to avoid duplicate IDs.
Configuration Management: Assigns unique machine/data center IDs.

Challenges

Clock drift: If the system clock moves backward, it can cause duplicate IDs.
Sequence overflow: If more than 4096 IDs are requested in a millisecond, the generator must wait for the next millisecond.
Deployment: Ensuring unique machine/data center IDs across the fleet.

Conclusion

Snowflake-style ID generation is a robust solution for distributed, high-scale systems. Careful attention to clock management and configuration is essential for reliability.