
Sarvesh Mishra
Lead Platform Engineer | DevOps / MLOps | Site Reliability Engineer
Lead Platform Engineer with 4+ years of experience architecting scalable infrastructure, MLOps pipelines, and DevOps automation at CarTrade Tech (CarWale). Currently leading platform engineering initiatives — driving quarterly infra planning, setting architectural standards, and mentoring engineers. Reduced infrastructure costs by 38%, cut LLM expenditure by 30%, and saved $24K/year by migrating org-wide on-call tooling in-house. Google Cloud Professional Cloud Architect certified.
Certifications

Technical Skills
Cloud & Infrastructure
DevOps & CI/CD
Programming Languages
Databases
Observability
Messaging & Caching
AI / MLOps
Platforms & Tools
Soft Skills
Work Experience
Lead Platform Engineer
Mumbai, IndiaTechnical Leadership
Lead platform engineering initiatives across CarTrade Tech; drive quarterly and annual infra planning, define architectural standards in collaboration with engineering managers, and review all infra PRs and Terraform designs before production deployment.
Hiring & Interviews
Conduct system design interviews and DSA technical screenings for engineering candidates; contribute to hiring decisions for platform and backend engineering roles.
Mentorship
Mentor and guide an associate platform engineer on Kubernetes operations, IaC best practices, and incident response — accelerating their ramp-up on production systems.
Platform Engineer / Software Engineer
Mumbai, IndiaKubernetes Cluster Management
Own and operate three EKS clusters (dev/staging/prod) across AWS multi-account architecture with Identity Tower; support 60+ microservices and 6 web apps spanning Mumbai (3 AZ) and Hyderabad regions for 80+ engineers.
Kubernetes Upgrades & Node Lifecycle
Execute bi-annual zero-downtime Kubernetes version upgrades; oversee node group lifecycle and Karpenter-based autoscaling to continuously balance reliability and cost efficiency across environments.
Karpenter & Custom CRDs
Implemented AWS Karpenter with custom CRDs for intelligent node provisioning, replacing managed node groups; improved cluster bin-packing efficiency and reduced over-provisioning cost.
Multi-Region DR & High Availability
Architected and maintain active-passive DR setup across Mumbai (3 AZ) and Hyderabad; conduct regular DR drills and failover validation to meet RPO/RTO targets.
Kubernetes Cost Optimization
Integrated KRR and OpenCost for continuous right-sizing of CPU/memory requests across all workloads; achieved 38% reduction in infrastructure spend while maintaining 99.99% uptime.
Helm Chart Management
Author and maintain Helm charts for platform tooling including Grafana Stack, Airflow, and internal services; standardized chart structure and release process across dev/staging/prod environments.
RBAC & Access Control
Enforce Kubernetes RBAC policies and LDAP/Active Directory integration for RabbitMQ, OpenSearch, and Redis; enforce least-privilege access across AWS multi-account environments via IAM Identity Center and Control Tower.
Incident Management & SRE On-Call
Participate in bi-weekly on-call rotation for platform reliability; triage and resolve OOMKills, crashloops, pod evictions, and infra incidents across 60+ microservices and 6 web apps.
MLOps AI Gateway
Architected migration from AWS AI Gateway to open-source MLflow; implemented multi-provider traffic routing and token-usage governance.
n8n Workflow Automation Platform
Deployed and productionized the n8n platform, enabling non-technical stakeholders (PMs) to construct ML workflows via drag-and-drop nodes, eliminating cross-team dependencies and ML team intervention.
Agentic AI Automation
Engineered an agentic AI layer in Jenkins to autonomously execute multi-repo operations — PR creation, dependency updates, testing, and linting.
Terraform Infrastructure Management
Administer IaC for two VPCs (dev/staging and production) spanning multiple subnets, EC2 fleets, EKS clusters, and AWS services; maintain S3 remote state backend with DynamoDB locking for safe concurrent operations.
GitOps Infra Workflow
Enforce PR-based infrastructure change workflow via GitHub Actions with Terraform plan output as PR comments; all production changes peer-reviewed before apply, reducing config drift and deployment risk.
ARM Architecture Migration
Spearheaded migration of compute workloads to AWS Graviton ARM instances, cutting monthly infrastructure spend and improving backend service performance.
In-House OnCall App
Built from scratch a React + GoLang application replicating OpsGenie-grade scheduling, escalations, and notifications; migrated org-wide on-call management in-house.
Backstage Developer Portal
Established Spotify Backstage as a centralized developer portal, cutting onboarding time and improving developer productivity across 80+ engineers.
Monitoring & Observability
Established comprehensive monitoring, logging, and alerting systems using Grafana Stack with OnCall integration and distributed tracing, reducing incident response time by 30%.
Kafka Migration
Led migration from legacy RabbitMQ to Kafka and created an internal messaging library, improving data delivery reliability and reducing consumer lag.
Rate Limiter Service
Launched a Redis-backed distributed rate limiter to throttle abusive traffic and monetize APIs, increasing revenue and improving API health score.
Frontend Telemetry (Grafana Faro)
Integrated Grafana Faro for real-time client-side telemetry across React frontends, enabling end-to-end tracing from UI to backend.
Internal Chatbot
Shipped a GPT-powered internal chatbot improving customer engagement via calling agents and reducing support ticket resolution time.
SSR Performance Optimisation
Re-architected SSR service into a Dockerized Node.js renderer with Redis caching; reduced cold-start latency, improved TTFB, and cut backend CPU usage.
Education
Masai School
Certificate in Full-Stack Web Development (MERN Stack)
Dr. A.P.J. Abdul Kalam Technical University
Bachelor of Technology — Mechanical Engineering
Achievements
- Rockstar Team of the Year — Annual Award 2025, CarTrade Tech.
- Best Performer of the Year — Annual Award 2024, CarTrade Tech.
- 1st Place, Internal Hackathon 2023 — Chaos Testing Integration Project.
- Best Debutant of the Year — Annual Award 2023, CarTrade Tech.