Skip to main content

One post tagged with "storage-engines"

View All Tags

Why Write Amplification, Not Just Throughput, Shapes Modern Databases [50PaperChallenge]

· 8 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Lessons from LSM Trees and WiscKey — Paper #2 & #3 of #50PaperChallenge

Introduction: Why This Paper Stayed With Me

In my #50PaperChallenge journey, I've been deliberately alternating between foundational theory and systems papers that quietly changed the industry. This pairing — LSM Tree (O’Neil et al., 1996) and WiscKey: Separating Keys from Values in SSD-Conscious Storage — sits squarely in that second category.

LSM Trees are everywhere today — RocksDB, Cassandra, HBase, LevelDB, DynamoDB's storage engine — traces its lineage back to the LSM Tree. We configure them, tune them, and occasionally curse them during compaction storms — often without thinking too deeply about why the design works or what exact cost we’re paying for that performance.

When I first encountered LSM Trees years ago, I mentally bucketed them as “the write-optimized alternative to B-Trees” and moved on.

LSM Trees are faster for writes, slower for reads, and compaction is expensive.

That's not wrong — but it's dangerously incomplete.