This project provides a production-ready Automatic Speech Recognition (ASR) REST API for Hindi, using a FastAPI backend and an ONNX-optimized NVIDIA NeMo stt_hi_conformer_ctc_medium model. It supports transcription of short audio clips (5-10 seconds, 16kHz WAV) and is containerized for easy deployment.
+--------+ +----------------+ +---------------------+
| Client | ---> | FastAPI Server | --> | ONNXRuntime Inference|
+--------+ +----------------+ +---------------------+
| (Hindi Conformer CTC)
|
[Audio Preprocessing (librosa, numpy)]
app/main.py)app/audio_utils.py)app/asr_inference.py)scripts/convert_to_onnx.py)stt_hi_conformer_ctc_medium model from NVIDIA NeMoDockerfile)
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
mkdir -p models
# Download stt_hi_conformer_ctc_medium.onnx to models/
# https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_hi_conformer_ctc_medium
mkdir -p downloads
# Download stt_hi_conformer_ctc_medium.nemo to downloads/
python scripts/convert_to_onnx.py \
--nemo-path downloads/stt_hi_conformer_ctc_medium.nemo \
--output-dir models
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
gunicorn -k uvicorn.workers.UvicornWorker app.main:app --bind 0.0.0.0:8000 --workers 4
# Build the image
docker build -t hindi-asr-app .
# Run the container
docker run -d \
--name asr-app \
-p 8000:8000 \
-v $(pwd)/models:/app/models \
hindi-asr-app
POST /transcribemultipart/form-datafile: Audio file (WAV, 16kHz, mono, 5-10s){
"status": "success",
"text": "transcribed text in Hindi",
"duration": 5.43,
"language": "hi"
}
GET /health{
"status": "healthy",
"model_loaded": true,
"version": "1.0.0"
}
curl -X POST "http://localhost:8000/transcribe" \
-H "accept: application/json" \
-F "file=@temp_audio/hindi_sample.wav"
import requests
def transcribe_audio(file_path, server_url="http://localhost:8000"):
with open(file_path, 'rb') as f:
files = {'file': f}
response = requests.post(f"{server_url}/transcribe", files=files)
return response.json()
# Example usage
result = transcribe_audio("temp_audio/hindi_sample.wav")
print(f"Transcribed Text: {result['text']}")
print(f"Processing Time: {result['duration']:.2f} seconds")
pytest tests/
python scripts/test_transcription.py \
--input-dir sample_audio \
--output results.json \
--num-files 5
models/.DEPLOYMENT.md for cloud/edge setup, CI/CD, and monitoring.For technical details, see Description.md.