Senior Site Reliability Engineer

About the Company

Adalat AI is building an end-to-end justice tech stack that automates manual and clerical pain points in courtrooms, giving judges back time to focus on what matters most: decision-making and delivering justice. Our solutions - from AI-powered transcription in Indian languages to case-flow management and document navigation - are now deployed across 9 states, covering nearly 20% of India’s judiciary. Backed by leading technology companies and funders, and incubated at MIT and Oxford, Adalat AI is working to eliminate judicial delays and expand access to timely justice. Founded by a team with backgrounds in law, technology, and economics from Harvard, Oxford, MIT, and IIIT Hyderabad, we are scaling rapidly across India and the Global South.

Role Overview

We’re hiring a Senior Site Reliability Engineer to own and scale the infrastructure behind our courtroom transcription platform. This is not a routine ops role - you’ll work on high-availability Kubernetes clusters, manage complex deployments with ArgoCD, and ensure reliability for a system processing sensitive, real-time data. You’ll collaborate with a small team of elite builders and be the go-to expert for keeping our platform robust, secure, and fast.

Key Responsibilities

Deploy, manage, and optimize Kubernetes clusters in production environments.
Operate and maintain ArgoCD for GitOps-based deployments.
Troubleshoot and iron out performance, reliability, and scaling issues across our clusters.
Build and maintain observability (metrics, logging, alerting) to catch and resolve issues proactively.
Collaborate with backend and product teams to ensure smooth, reliable releases.
Define and enforce infrastructure best practices, focusing on security, scalability, and resilience.

Qualifications

10+ years of experience in production infrastructure, reliability, or DevOps roles.
Proven experience deploying and managing Kubernetes clusters at scale.
Experience maintaining CI/CD with GitHub actions.
Hands-on expertise with ArgoCD (setup, tuning, troubleshooting).
Solid foundation in Linux systems, networking, and container internals.
Experience with monitoring/alerting stacks (Prometheus, Grafana, Loki, etc.).
Comfortable diving into complex problems and quickly stabilizing systems.

Bonus:
Experience with GCP.
Contributions to open-source infrastructure or reliability tooling.