8 posts tagged with "database"

Why Write Amplification, Not Just Throughput, Shapes Modern Databases [50PaperChallenge]

November 15, 2025 · 8 min read

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Lessons from LSM Trees and WiscKey — Paper #2 & #3 of #50PaperChallenge

Introduction: Why This Paper Stayed With Me

In my #50PaperChallenge journey, I've been deliberately alternating between foundational theory and systems papers that quietly changed the industry. This pairing — LSM Tree (O’Neil et al., 1996) and WiscKey: Separating Keys from Values in SSD-Conscious Storage — sits squarely in that second category.

LSM Trees are everywhere today — RocksDB, Cassandra, HBase, LevelDB, DynamoDB's storage engine — traces its lineage back to the LSM Tree. We configure them, tune them, and occasionally curse them during compaction storms — often without thinking too deeply about why the design works or what exact cost we’re paying for that performance.

When I first encountered LSM Trees years ago, I mentally bucketed them as “the write-optimized alternative to B-Trees” and moved on.

LSM Trees are faster for writes, slower for reads, and compaction is expensive.

That's not wrong — but it's dangerously incomplete.

Why Latency, Not Partitions, Dictates Your Database's Consistency [50PaperChallenge]

November 8, 2025 · 6 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Confession: As someone with difficulty reading a lot of text, I’m definitely not a fan of long, dense academic text. Video lectures have always been my preferred way to learn. Honestly, reading research papers is something I’ve dodged for years—too much jargon, too many walls of text, and not enough clarity. But that’s exactly why I’m giving myself this challenge #50PaperChallenge: I want to see how far I can go if I really stick with it, and whether pushing through helps me learn things that actually last.

My goal isn’t just to skim headlines or collect citations. I want to go deeper—reading seminal technical whitepapers and really figuring out what’s inside, even if that means slowing down, re-reading, and wrestling with tough concepts.

But here’s the twist: I’m doing all this in public, right here, as a sort of open online notebook.

Why? Two big reasons:

Memory for my future self: Writing down my takeaways helps me process, organize, and actually remember what I’ve learned. Putting them out there means I can always come back later when I need a refresher.

Maybe it helps you too: If you’re an engineer, researcher, or just another tech nerd, maybe these notes will help you discover (or rediscover) some classics. Or maybe you’ll just relate to my struggle—and those occasional “aha!” moments—trying to crack technical content.

So, consider this an open journal. I’ll do my best to cut through the jargon, flag the breakthroughs, and be honest about what clicked and what didn’t.

To kick things off, I picked a paper that’s sparked more conversations (and arguments!) in our world than almost any other:

Consistency Tradeoffs in Modern Distributed Database System Design by Daniel Abadi

Let's unpack that.

Zookeeper Sessions and life cycle

November 4, 2016 · 2 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Session and request order handling:

Sessions is very important and quite critical for the operation of ZooKeeper. All operations a client submits to ZooKeeper are associated to a session. When a session ends for any reason, the ephemeral nodes created during that session disappear.

The client initially connects to any server in the ensemble, and only to a single server. It uses a TCP connection to communicate with the server, but the session may be moved to a different server if the client has not heard from its current server for some time. Moving a session to a different server is handled transparently by the ZooKeeper client library.

Zookeeper Namespace And Operations

October 27, 2016 · 5 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Zookeper data Model:

ZooKeeper has a hierarchal name space(as shown below), much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory.

zookeeper-data-model

Zookeeper Introduction to Zookeeper

October 20, 2016 · 3 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Zookeper is an open-source, centralised co-ordination service which is used to co-ordinate the services and manage the configurations of applications accross a large number of hosts over a distributed environment.

Co-ordinating between the services in a distributed application is a complex process. ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, similar to filesystem API, that allows developers to implement common co‐ordination tasks, such as electing a master server,managing group membership, and managing metadata.

Zookeper is open-sorced to Apache by Yahoo. Apache Zookeper have became standard for organising the services in Hadoop, kafka and other distributed frameworks.

Message Queue

September 20, 2016 · 2 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

What is Message Passing?

Message passing is a technique to enable inter-process communication (IPC), or for inter-thread communication within the same process communication between two distributed or non-distributed parallel processes in synchronous or asynchronous mode, The communications are completed by the sending of messages (functions, signals and data packets) to recipients.

Normalisation

September 14, 2015 · 4 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Normalisation is the process of eliminating the redundancy, minimising the use of null values and prevention of the loss of information by establishing relations and ensuring data integrity.

Data should only be stored once and avoid storing data that can be calculated from other data already held in the database. During the process of normalisation redundancy must be removed, but not at the expense of breaking data integrity rules.

The removal of redundancy helps to prevent insertion, deletion, and update errors, since the data is only available in one attribute of one table in the database.

How to Find out Next and Previous Day of Week in Oracle

February 3, 2015 · 2 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

I have seen people writing number of lines of code to find out the date on a specific day in current or previous week, most often we need Friday the last working day of the week. We have a pretty simple way to find out in Oracle Queries.

Introduction: Why This Paper Stayed With Me​

Session and request order handling:​

Zookeper data Model:​

What is Message Passing?​

Introduction: Why This Paper Stayed With Me

Session and request order handling:

Zookeper data Model:

What is Message Passing?