Skip to main content

5 posts tagged with "hadoop"

View All Tags

Introduction To MapReduce

· 2 min read
Narendra Dubey
Builder of Data Systems and Software

MapReduce is a framework for processing large amount of data residing on hundreds of computers, its an extraordinarily powerful paradigm. MapReduce was first introduced by Google in 2004 MapReduce: Simplified Data Processing on Large Clusters.

In this article we'll see how MapReduce processes the data, I am considering the Word Count program as a example, yeah!! this is the worlds most famous MapReduce program!!

HDFS Architecture

· 2 min read
Narendra Dubey
Builder of Data Systems and Software

The Hadoop Distributed File System (HDFS) is a highly fault tolerant file system designed and optimized to be deployed on a distributed infrastructure established with a bunch commodity hardware. HDFS provides high throughput access to application data and is best suited for applications that have large data sets. Unlike existing distributed file systems HDFS have loosen up a few POSIX Standards to enable streaming access to file system data. HDFS was originally developed as an infrastructure for the Apache Nutch web search engine project.

Difference between Unix and Linux

· 3 min read
Narendra Dubey
Builder of Data Systems and Software

Back in 1969, UNIX has evolved through a number of different versions and environment. One of the Original UNIX editors has licensed the most modern UNIX variants. It's closed Source software. There are various flavors of UNIX are available in market like Sun's Solaris, Hewlett-Packard's HP-UX, and IBM's AIX all of these have their own Unique Foundation, all these flavors are optimized and incorporated with different tools which are most compatible with their hardware.

Linux is an open source (free to use and redistribute under GNU licenses) operating system widely used for computer hardware and software, game development, tablet PCS, mainframes etc. UNIX is a copyrighted name only big companies (IBM, HP etc…) are allowed to use the UNIX copyright and names. UNIX is commonly used in internet servers, workstations and PCs by Solaris, Intel, and HP etc.