Ashish Bhate, Prakhar Goel, Vinayak Shukla
Guide
15
min read
Building a Reliable Voice Transcription Pipeline for Indian Courtrooms (Part 1)
Indian courts move at the speed of a judge's handwriting. Adalat AI set out to change that with real-time voice transcription — but flaky courtroom internet made simply keeping the connection alive its own engineering battle. Part 1 covers WebSockets, application-level keepalives, and the hard lessons of unreliable networks.
Across India’s courts, one of the quiet bottlenecks to justice is a shortage of skilled stenographers. Without them, judges have to record proceedings by hand. So the pace of the entire courtroom ends up bound to how fast a judge can write.
At Adalat AI, this was the first problem we chose to fix: let the judge speak and let the machine keep the record. Freed from writing every word by hand, a judge can hear far more matters in a day — the court moves at the speed of justice, not the speed of handwriting.
Building the speech model was hard in its own right. Indian courtrooms are noisy and chaotic, and accurate transcription in that environment is a genuine challenge. Our ML team has worked hard to build models that work in such scenarios, and judges were noticeably more productive. But once we started to scale, a second, less obvious problem surfaced. Courts don't always have reliable, high-speed internet, and keeping a real-time transcription system alive over unstable networks proved to be an engineering challenge in its own right.
This post is a behind-the-scenes look at what it actually takes to keep a real-time streaming platform working when the internet refuses to cooperate.
Problem Statement
The basic idea is simple: we capture audio chunks from the browser, send them to the server, and stream the transcript back. However, there are a couple of constraints that make this challenging:
The transcript needs to stream back to the user in real time as they speak.
The system should be able to work over unreliable and spotty internet connections.
This means we need a real-time, bidirectional connection between the browser and the server. In practice, this meant we had to choose WebSockets. While WebSockets have been widely used for decades, there are some unique challenges that must be addressed when operating under flaky network conditions.
Is It Alive?
The first problem is determining whether your connection is actually alive.
It is a mistake to assume that the WebSocket API will surface failures reliably when the network drops.
Try this simple experiment: initiate a WebSocket connection and set a timer to send a message every five seconds. Then quietly unplug your Ethernet cable from the router. You will see that writes to the socket can continue “successfully” without throwing any error!
This is a well-known gotcha, and several people have learned this the hard way: https://making.close.com/posts/reliable-websockets/.
Lesson 1
Implement application-level keepalives to reliably detect broken connections and recover from them. Do note that while the WebSocket protocol supports PING/PONG frames, they are not exposed to the browser API. So it needs to be implemented at an application level. The blog above does a great job of explaining it in detail.
send is asynchronous
Calling the send API is not a guarantee that the data has been sent. The documentation says:
The
WebSocket.send()method enqueues the specified data to be transmitted to the server over the WebSocket connection
Lesson 2
Check the
bufferedAmountattribute to ensure that all pending data has been flushed after you close the WebSocket.
How do query libraries like Tanstack compare?
We used tanstack for handling state and naively assumed that network reliability was a solved problem. We were wrong. The OnlineManager API is unreliable.
In previous versions, navigator.onLine was used to determine the network status. However, it doesn't work well in Chromium based browsers. There are a lot of issues around false negatives, which lead to Queries being wrongfully marked as offline.
While it's true that navigator.online is unreliable, they have replaced that with window.addEventListener(online/offline) which is equally unreliable as well. If you want to try it out, extend the previous test of sending to a websocket API every 5s by adding an event listener, and then midway switch from wifi to a hotspot. The event listener won't fire!
A better solution is to use the navigator.connection.addEventListener('change’) API. But it’s only limited to Edge/Chrome.
Lesson 3
Do not rely on tanstack or other libraries that use simple browser APIs. The only thing that works all the time reliably is application-level keepalives.
Implementing keepalives correctly
Now that we know we have to use application-level keepalives, we are only halfway there! We still need to implement it correctly.
Let’s say you write some code like this:
The idea is that if you have missed a certain number of keepalives, you will close the websocket, which will trigger the handleClose callback, where you will initiate a reconnection.
This sounds reasonable. Except, ws.close() does not always mean onClose callback will be called! You have to dive deep into the RFC to understand this. https://websockets.spec.whatwg.org/#eventdef-websocket-close mentions:
When the WebSocket connection is closed, possibly [cleanly] the user agent must queue a task to run the following substeps:
Change the ready state to
CLOSED(3).If the user agent was required to fail the WebSocket connection, or if the WebSocket connection was closed after being flagged as full, fire an event named
errorat theWebSocketobject. [WSP]Fire an event named
closeat theWebSocketobject, usingCloseEvent…
But what exactly does “WebSocket connection is closed” mean? According to the RFC, the WebSocket close protocol handshake must be followed for a WebSocket to be called “closed”.
Now, look what happens on a broken network:
If you actually dive into the Google Chrome source code, there are a few more cases that do trigger onclose , but none of them are related to calling ws.close . Other browsers might implement this differently, though.
This is the corrected code:
Lesson 4
Do not rely on onclose callback for handling reconnects. Call it yourself.
What if creating a WebSocket connection gets stuck?
This line this.ws = new WebSocket(url); can itself get stuck during the WebSocket handshake if there's a network blip just at that moment. Note that there is no timeout on this method. Which means we need to implement it ourselves!
This is a sample implementation:
Lesson 5
Implement a timeout and retry mechanism for websocket connection.
Be careful of browser throttling tabs
A reliable transcription pipeline must work even offline. This naturally meant maintaining an internal buffer to queue audio chunks and push them to the server as soon as a connection is established. But if there’s no internet, then it means we need to clean up the state after a certain timeout, because we can’t be storing the corresponding server state indefinitely.
So we had a 30-min max offline timeout, beyond which we would clean up client state. But for long timeout values like 30 mins, the user can switch to a different tab or do something else, which can cause the browser to throttle the tab and delay the timeout: https://developer.mozilla.org/en-US/docs/Web/API/Window/setTimeout#timeouts_in_inactive_tabs.
If your server assumes that, after n minutes, the client will clean up its state, then that assumption is broken!
Lesson 6
Use absolute timestamps as a reconnection strategy rather than relying solely on setTimeout to account for delays when a browser throttles tabs.
Final Words
And this is just part 1. In part 2, we will go deeper into how we guarantee zero data loss while sending and receiving audio and transcripts, including sequencing, acknowledgments, and replay strategies. Stay tuned!
