Site Reliability Engineer
Remote
Development
About the Company
Adalat AI is reimagining India's judicial infrastructure through cutting-edge AI. From courtroom transcription to case summarization, we're building the end-to-end justice tech stack that powers faster, fairer courts — just like UPI did for payments and Aadhaar for identity.
We've deployed our speech recognition and legal understanding tools in 10 states, with recent launches at the Delhi High Court and partnerships backed by leading global foundations. Our founding team blends deep experience across law, AI, and NLP — with alumni from Harvard, MIT, Oxford, and IIIT-H — and we've been recognized by the world's top accelerators and competitions for our impact.
Role Overview
We’re hiring a Site Reliability Engineer to own and scale the infrastructure behind our courtroom transcription platform. This is not a routine ops role — you’ll work on high-availability Kubernetes clusters, manage complex deployments with ArgoCD, and ensure reliability for a system processing sensitive, real-time data. You’ll collaborate with a small team of elite builders and be the go-to expert for keeping our platform robust, secure, and fast.
Key Responsibilities
Deploy, manage, and optimize Kubernetes clusters in production environments.
Operate and maintain ArgoCD for GitOps-based deployments.
Troubleshoot and iron out performance, reliability, and scaling issues across our clusters.
Build and maintain observability (metrics, logging, alerting) to catch and resolve issues proactively.
Collaborate with backend and product teams to ensure smooth, reliable releases.
Define and enforce infrastructure best practices, focusing on security, scalability, and resilience.
Qualifications
10+ years of experience in production infrastructure, reliability, or DevOps roles.
Proven experience deploying and managing Kubernetes clusters at scale.
Experience maintaining CI/CD with GitHub actions.
Hands-on expertise with ArgoCD (setup, tuning, troubleshooting).
Solid foundation in Linux systems, networking, and container internals.
Experience with monitoring/alerting stacks (Prometheus, Grafana, Loki, etc.).
Comfortable diving into complex problems and quickly stabilizing systems.
Bonus:
Experience with GCP.
Contributions to open-source infrastructure or reliability tooling.
Benefits and Perks
WFH with flexible work hours.
Unlimited PTO.
Contacts within the Harvard / MIT/ Oxford ecosystem.
Autonomy and Ownership
Smart, Humble and Friendly peers
Generous vacation
Maternity and Paternity leaves
Learning & Development resources
Know more about Adalat AI
Join Our Team
To apply, please send your resume and a cover letter with the subject line: "Site Reliability Engineer | [Your name]".