ML Engineer — Runtime | Careers

About the Company

Adalat AI is building an end-to-end justice tech stack that automates manual and clerical pain points in courtrooms, giving judges back time to focus on what matters most: decision-making and delivering justice. Our solutions - from AI-powered transcription in Indian languages to case-flow management and document navigation - are now deployed across 9 states, covering nearly 20% of India’s judiciary. Backed by leading technology companies and funders, and incubated at MIT and Oxford, Adalat AI is working to eliminate judicial delays and expand access to timely justice. Founded by a team with backgrounds in law, technology, and economics from Harvard, Oxford, MIT, and IIIT Hyderabad, we are scaling rapidly across India and the Global South.

Role Overview

You’ll be a key part of the team building our Legal Intelligence runtime stack — helping us serve real-time speech recognition, retrieval, and summarization in low-bandwidth, resource-constrained environments.

This is a full-time position focused on making our ML models fast, lightweight, and deployable across thousands of Indian courtrooms — from remote district courts to the Supreme Court. As an early member of the team, you will:

Collaborate closely with the founding team to enhance model performance, enabling seamless operation for judges and stenographers.
Identify and implement innovative solutions to optimize machine learning models for various hardware architectures, including CPUs and GPUs.
Work in close collaboration with cross-functional partners in design, backend, and frontend functions.
Solve complex problems related to model efficiency and scalability.
Build cost-effective and scalable systems that can operate efficiently in resource-constrained environments.

Key Responsibilities

Design and optimize speech and text pipelines — especially for Indic languages.
Implement compiler-aware workflows that reduce latency, memory, and energy usage.
Apply compression techniques (quantization, pruning, distillation) to deploy models on diverse and constrained hardware.
Collaborate with hardware teams to leverage new CPU/GPU/accelerator features via MLIR, LLVM, or ONNX.
Benchmark, debug, and stress-test inference across thousands of hours of real-world audio and documents.
Build infrastructure for scalable, cost-efficient inference under heavy workloads.

About You

You don’t need to meet every single qualification — we value diverse backgrounds and non-linear paths.

Educational Background:
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field from leading institutions.
Professional Experience:
- 4+ years of experience in machine learning optimization, model compression, compiler development, or related areas.
Technical Skills:
- Strong programming skills in Python or C/C++
- Experience with deep learning frameworks (PyTorch or TensorFlow)
- Strong understanding of compiler architectures, including front-end and middle-end optimizations, scheduling, and code generation.
- Familiarity with compiler frameworks such as LLVM or MLIR.
- Hands-on experience with model optimization techniques, including quantization (e.g., Post-Training Quantization, Quantization-Aware Training), pruning, and distillation.
- Knowledge of hardware architectures and experience deploying ML systems in resource-constrained environments
Additional Qualifications (Preferred):
- Experience with advanced batching strategies and efficient inference engines for large language models.
- Familiarity with retrieval-augmented generation (RAG), graph neural networks (GNNs), and agentic frameworks.
- Experience contributing to research communities, including publications at conferences and/or journals.

What You Will Achieve in a Year

Optimized our end-to-end ML stack for 5,000+ courtrooms running 10–12 hours daily.
Solved some of the toughest runtime challenges in our stack — from dialect variability to model drift in noisy courtroom settings.
Delivered state-of-the-art performance in legal speech and text understanding running on real-world hardware.