Narendra Kumar
Associate Director of Engineering — Data & ML Platform
Professional Summary
Engineering leader with 14+ years of experience building large-scale data infrastructure, ML platforms, and platform products across fintech, commerce, and consumer internet companies. Currently lead Razorpay's Data & ML Platform charter across 5 pods and 43 engineers, owning lakehouse, streaming, analytics infrastructure, merchant reporting, governance, and AI/data product platforms. Proven track record of scaling petabyte-scale platforms, processing 56B+ events/day, reducing annualised platform cost by $2M, and converting platform investments into self-serve products and measurable business outcomes.
Impact Snapshot
- Lead Razorpay's Data & ML Platform org of 34 engineers across 5 pods.
- Own platforms processing 56B+ events/day, 3.6PB+ storage, and 189M+ monthly analytical queries.
- Delivered $2M annualized cost reduction through platform re-architecture, workload optimisation, and governance.
- Re-architected merchant reporting platform generating 523K+ reports/month with 99.99% SLA and 74% lower operating cost.
- Built real-time streaming systems capable of handling 500x traffic surges during high-volume events such as IPL.
- Launched platform-led products, including DataSync, InsightX, and Anomaly Detection and Alerting system across merchant data integration, AI-driven insights, and real-time anomaly detection.
Experience
Joined as Engineering Manager in 2022; promoted every ~1.5 years, currently Associate Director as scope expanded across Data & ML Platform, org leadership, platform strategy, and productized data/AI systems.
Leadership & Strategy
- Head the Data & ML Platform charter, leading 5 pods and 34 engineers across data infra, platform, analytics, reporting, and ML platform.
- Built a multi-pod platform operating model with clear charter ownership, planning cadence, execution reviews, platform success metrics, and leadership mechanisms across engineering managers, tech leads, and senior ICs
- Defined and executed Razorpay's data platform strategy, enabling AI-driven financial reporting and merchant insights products used by millions of customers monthly.
- Managed cross-functional initiatives across developer productivity, cost optimisation, data availability, data reliability, and platform adoption.
- Designed and implemented organisation-wide mentoring programs benefiting 128 individual contributors across engineering levels
Platform, Product & Business Impact
- Led Data Platform re-architecture, reducing annualised Data Platform Cost by $2M while improving platform availability to 99% and reducing recurring data quality issues by 97%.
- Scaled the platform to 3.6 PB+ storage and 56B+ events/day ingestion with 10-minute data availability for critical analytical workloads.
- Scaled Trino-based query infrastructure to support 50M+ queries/month across analytics, reporting, and self-serve data exploration.
- Led ADA — Anomaly Detection & Alerting, a multi-tenant real-time anomaly detection and fraud-prevention platform built on Kafka, Flink, and ClickHouse, delivering ~80% cost reduction versus Pinot + ThirdEye, sub-30-second detection latency.
- Re-architected Merchant Reporting Platform to generate 523K+ financial reports/month, reducing operational cost by 74% while maintaining 99.99% SLA.
- Built a real-time streaming architecture capable of handling 500x traffic surges during high-volume events such as IPL, ensuring zero downtime for critical data flows.
- Envisioned and launched two platform-led products: ‘DataSync’, a no-code merchant data integration product, and InsightX an AI-driven merchant insights platform.
- Built platform capabilities for feature engineering, real-time data availability, and AI-driven merchant insights, enabling faster experimentation and productionization of ML/data products.
- Built and managed a 36-member Data & DevOps organisation spanning data platform, ingestion, infrastructure, and security.
- Built and led 3 teams that delivered a self-serve data ingestion platform processing 12B+ events.
- Migrated the entire data infrastructure from AWS to GCP, reducing operational costs and improving performance for mission-critical data pipelines.
- Managed infrastructure security setup, including firewalls, WAF, and OWASP-aligned controls, ensuring compliance and security across platform layers.
- Directed a team of 9 engineers to optimise the data ingestion layer, reducing costs by 60% month-over-month.
- Scaled analytics platforms to support 80+ analysts, delivering 50K+ queries daily.
- Optimised Redshift performance, reducing p90 query latency from 22 minutes to under 10 minutes.
- Designed and developed a high-scale data ingestion platform processing 48B+ events daily, with a p95 turnaround time under 10 minutes.
- Built self-serve data platform capabilities for 50+ users, supporting 3K+ pipelines and 10K+ analytical models.
- Deployed and optimised Presto, Hive Metastore, and Metabase for 200+ analysts, enabling efficient querying and reporting.
Joined as Sr. Engineer and progressed through annual promotions to Architect while building enterprise data, AI, and streaming platforms.
- Designed and built Piperr™ - AI for Enterprise DataOps platform, from 0→1.
- Built real-time streaming applications for fraud detection and data enrichment.
- Implemented a cross-category recommendation engine using four algorithms, delivering 500K+ recommendations/day.
- Built and trained a chatbot for flight and hotel booking, including named entity recognition for destinations, dates, names, and domain-specific keywords.
- Estimated, designed, and developed ETL systems for healthcare data processing use cases.
- Received Client Delight Award in 2015.
- Received the Aryabhatta Innovation Award in 2014.
- MovoGrid: Developed with Microsoft Kinect and QPainter, MovoGrid is an interactive, gesture-driven advertising platform for indoor use.
Technical Skills
Platform & Architecture: Data Infrastructure, Data Platform, ML Platform, Distributed Systems, Lakehouse, Streaming Platform, Real-Time Processing, Anomaly Detection Platform, Platform Reliability, Cost Governance, Data Governance, Metadata, Data Quality, Data Observability
Data & ML Systems: Kafka, Spark, Flink, Trino, Pinot, ClickHouse, Redshift, BigQuery, TiDB, S3 Lakehouse, CDC, Feature Engineering, OLAP, Analytics Infrastructure, Reporting Platforms, DSL-based Rule Engines
Cloud & Infrastructure: AWS, GCP, Kubernetes, Platform Reliability, Security Hardening, Multi-Team Platform Operations
Programming: Python, Java, Scala, Go
Leadership: Org Design, Engineering Strategy, Hiring, Planning, Delegation, Mentoring, Manager/Tech Lead Operating Model, Platform Adoption, Cross-Functional Execution
Education
BE in Computer Science and Engineering
REC Bhalki — Class of 2012