Skip to main content

8 posts tagged with "architecture"

View All Tags

Why Latency, Not Partitions, Dictates Your Database's Consistency [50Week50Papper]

· 6 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Confession: As someone with dyslexia, I’m definitely not a fan of long, dense academic text. Video lectures have always been my preferred way to learn. Honestly, reading research papers is something I’ve dodged for years—too much jargon, too many walls of text, and not enough clarity. But that’s exactly why I’m giving myself this challenge 50Week50Papper: I want to see how far I can go if I really stick with it, and whether pushing through helps me learn things that actually last.

My goal isn’t just to skim headlines or collect citations. I want to go deeper—reading one seminal technical whitepaper every week and really figuring out what’s inside, even if that means slowing down, re-reading, and wrestling with tough concepts.

But here’s the twist: I’m doing all this in public, right here, as a sort of open online notebook.

Why? Two big reasons:

Memory for my future self: Writing down my takeaways helps me process, organize, and actually remember what I’ve learned. Putting them out there means I can always come back later when I need a refresher.

Maybe it helps you too: If you’re an engineer, researcher, or just another tech nerd, maybe these notes will help you discover (or rediscover) some classics. Or maybe you’ll just relate to my struggle—and those occasional “aha!” moments—trying to crack technical content.

So, consider this an open journal. I’ll do my best to cut through the jargon, flag the breakthroughs, and be honest about what clicked and what didn’t.

Welcome to Week 1! And to kick things off, I picked a paper that’s sparked more conversations (and arguments!) in our world than almost any other:

Consistency Tradeoffs in Modern Distributed Database System Design by Daniel Abadi

Let's unpack that.

Inside Modern Machine Learning Platforms: A Survey Across the Industry

· 8 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Machine Learning (ML) is no longer just a research field or a niche corner of data science teams. Today, it's deeply embedded in products we use daily — be it your Uber ETA, Instagram feed, Netflix recommendations, or fraud detection in fintech. And the real magic behind deploying these models reliably, at scale, lies in how companies build their internal ML platforms.

In this post, we dive deep into how different organisations — ranging from tech giants to startups — are architecting their ML systems. We've compiled insights from real-world case studies, open-source projects, and platform blueprints to show how companies are solving the same problem with wildly different approaches.

Zookeeper Sessions and life cycle

· 2 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Session and request order handling:

Sessions is very important and quite critical for the operation of ZooKeeper. All operations a client submits to ZooKeeper are associated to a session. When a session ends for any reason, the ephemeral nodes created during that session disappear.

The client initially connects to any server in the ensemble, and only to a single server. It uses a TCP connection to communicate with the server, but the session may be moved to a different server if the client has not heard from its current server for some time. Moving a session to a different server is handled transparently by the ZooKeeper client library.

Zookeeper Namespace And Operations

· 5 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Zookeper data Model:

ZooKeeper has a hierarchal name space(as shown below), much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory.

zookeeper-data-model

Zookeeper Introduction to Zookeeper

· 3 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

Zookeper is an open-source, centralised co-ordination service which is used to co-ordinate the services and manage the configurations of applications accross a large number of hosts over a distributed environment.

Co-ordinating between the services in a distributed application is a complex process. ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, similar to filesystem API, that allows developers to implement common co‐ordination tasks, such as electing a master server,managing group membership, and managing metadata.

Zookeper is open-sorced to Apache by Yahoo. Apache Zookeper have became standard for organising the services in Hadoop, kafka and other distributed frameworks.

Message Queue

· 2 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

What is Message Passing?

Message passing is a technique to enable inter-process communication (IPC), or for inter-thread communication within the same process communication between two distributed or non-distributed parallel processes in synchronous or asynchronous mode, The communications are completed by the sending of messages (functions, signals and data packets) to recipients.

Introduction To MapReduce

· 2 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

MapReduce is a framework for processing large amount of data residing on hundreds of computers, its an extraordinarily powerful paradigm. MapReduce was first introduced by Google in 2004 MapReduce: Simplified Data Processing on Large Clusters.

In this article we'll see how MapReduce processes the data, I am considering the Word Count program as a example, yeah!! this is the worlds most famous MapReduce program!!

HDFS Architecture

· 2 min read
Narendra Dubey
Systems builder. Platform tinkerer. Distributed architecture troublemaker.

The Hadoop Distributed File System (HDFS) is a highly fault tolerant file system designed and optimized to be deployed on a distributed infrastructure established with a bunch commodity hardware. HDFS provides high throughput access to application data and is best suited for applications that have large data sets. Unlike existing distributed file systems HDFS have loosen up a few POSIX Standards to enable streaming access to file system data. HDFS was originally developed as an infrastructure for the Apache Nutch web search engine project.