9 posts tagged with "architecture"

Why Write Amplification, Not Just Throughput, Shapes Modern Databases [50PaperChallenge]

November 15, 2025 · 8 min read

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Lessons from LSM Trees and WiscKey — Paper #2 & #3 of #50PaperChallenge

Introduction: Why This Paper Stayed With Me

In my #50PaperChallenge journey, I've been deliberately alternating between foundational theory and systems papers that quietly changed the industry. This pairing — LSM Tree (O’Neil et al., 1996) and WiscKey: Separating Keys from Values in SSD-Conscious Storage — sits squarely in that second category.

LSM Trees are everywhere today — RocksDB, Cassandra, HBase, LevelDB, DynamoDB's storage engine — traces its lineage back to the LSM Tree. We configure them, tune them, and occasionally curse them during compaction storms — often without thinking too deeply about why the design works or what exact cost we’re paying for that performance.

When I first encountered LSM Trees years ago, I mentally bucketed them as “the write-optimized alternative to B-Trees” and moved on.

LSM Trees are faster for writes, slower for reads, and compaction is expensive.

That's not wrong — but it's dangerously incomplete.

Why Latency, Not Partitions, Dictates Your Database's Consistency [50PaperChallenge]

November 8, 2025 · 6 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Confession: As someone with difficulty reading a lot of text, I’m definitely not a fan of long, dense academic text. Video lectures have always been my preferred way to learn. Honestly, reading research papers is something I’ve dodged for years—too much jargon, too many walls of text, and not enough clarity. But that’s exactly why I’m giving myself this challenge #50PaperChallenge: I want to see how far I can go if I really stick with it, and whether pushing through helps me learn things that actually last.

My goal isn’t just to skim headlines or collect citations. I want to go deeper—reading seminal technical whitepapers and really figuring out what’s inside, even if that means slowing down, re-reading, and wrestling with tough concepts.

But here’s the twist: I’m doing all this in public, right here, as a sort of open online notebook.

Why? Two big reasons:

Memory for my future self: Writing down my takeaways helps me process, organize, and actually remember what I’ve learned. Putting them out there means I can always come back later when I need a refresher.

Maybe it helps you too: If you’re an engineer, researcher, or just another tech nerd, maybe these notes will help you discover (or rediscover) some classics. Or maybe you’ll just relate to my struggle—and those occasional “aha!” moments—trying to crack technical content.

So, consider this an open journal. I’ll do my best to cut through the jargon, flag the breakthroughs, and be honest about what clicked and what didn’t.

To kick things off, I picked a paper that’s sparked more conversations (and arguments!) in our world than almost any other:

Consistency Tradeoffs in Modern Distributed Database System Design by Daniel Abadi

Let's unpack that.

Inside Modern Machine Learning Platforms: A Survey Across the Industry

April 13, 2025 · 8 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Machine Learning (ML) is no longer just a research field or a niche corner of data science teams. Today, it's deeply embedded in products we use daily — be it your Uber ETA, Instagram feed, Netflix recommendations, or fraud detection in fintech. And the real magic behind deploying these models reliably, at scale, lies in how companies build their internal ML platforms.

In this post, we dive deep into how different organisations — ranging from tech giants to startups — are architecting their ML systems. We've compiled insights from real-world case studies, open-source projects, and platform blueprints to show how companies are solving the same problem with wildly different approaches.

Zookeeper Sessions and life cycle

November 4, 2016 · 2 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Session and request order handling:

Sessions is very important and quite critical for the operation of ZooKeeper. All operations a client submits to ZooKeeper are associated to a session. When a session ends for any reason, the ephemeral nodes created during that session disappear.

The client initially connects to any server in the ensemble, and only to a single server. It uses a TCP connection to communicate with the server, but the session may be moved to a different server if the client has not heard from its current server for some time. Moving a session to a different server is handled transparently by the ZooKeeper client library.

Zookeeper Namespace And Operations

October 27, 2016 · 5 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Zookeper data Model:

ZooKeeper has a hierarchal name space(as shown below), much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory.

zookeeper-data-model

Zookeeper Introduction to Zookeeper

October 20, 2016 · 3 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Zookeper is an open-source, centralised co-ordination service which is used to co-ordinate the services and manage the configurations of applications accross a large number of hosts over a distributed environment.

Co-ordinating between the services in a distributed application is a complex process. ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, similar to filesystem API, that allows developers to implement common co‐ordination tasks, such as electing a master server,managing group membership, and managing metadata.

Zookeper is open-sorced to Apache by Yahoo. Apache Zookeper have became standard for organising the services in Hadoop, kafka and other distributed frameworks.

Message Queue

September 20, 2016 · 2 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

What is Message Passing?

Message passing is a technique to enable inter-process communication (IPC), or for inter-thread communication within the same process communication between two distributed or non-distributed parallel processes in synchronous or asynchronous mode, The communications are completed by the sending of messages (functions, signals and data packets) to recipients.

Introduction To MapReduce

May 14, 2015 · 2 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

MapReduce is a framework for processing large amount of data residing on hundreds of computers, its an extraordinarily powerful paradigm. MapReduce was first introduced by Google in 2004 MapReduce: Simplified Data Processing on Large Clusters.

In this article we'll see how MapReduce processes the data, I am considering the Word Count program as a example, yeah!! this is the worlds most famous MapReduce program!!

HDFS Architecture

May 3, 2015 · 2 min read

Narendra Dubey

Systems builder. Platform tinkerer. Distributed architecture troublemaker.

The Hadoop Distributed File System (HDFS) is a highly fault tolerant file system designed and optimized to be deployed on a distributed infrastructure established with a bunch commodity hardware. HDFS provides high throughput access to application data and is best suited for applications that have large data sets. Unlike existing distributed file systems HDFS have loosen up a few POSIX Standards to enable streaming access to file system data. HDFS was originally developed as an infrastructure for the Apache Nutch web search engine project.

Introduction: Why This Paper Stayed With Me​

Session and request order handling:​

Zookeper data Model:​

What is Message Passing?​

Introduction: Why This Paper Stayed With Me

Session and request order handling:

Zookeper data Model:

What is Message Passing?