hindi-asr-transcriptor

Hindi Audio Speech Recognition App (ONNX Optimization, NVIDIA NeMo Model)

Overview

This project provides a production-ready Automatic Speech Recognition (ASR) REST API for Hindi, using a FastAPI backend and an ONNX-optimized NVIDIA NeMo stt_hi_conformer_ctc_medium model. It supports transcription of short audio clips (5-10 seconds, 16kHz WAV) and is containerized for easy deployment.

Architecture

+--------+      +----------------+      +---------------------+
| Client | ---> | FastAPI Server | -->  | ONNXRuntime Inference|
+--------+      +----------------+      +---------------------+
                                  |     (Hindi Conformer CTC)
                                  |
                        [Audio Preprocessing (librosa, numpy)]

Project Video

screen-capture.webm

Key Components

1. FastAPI Application (app/main.py)

2. Audio Preprocessing (app/audio_utils.py)

3. ASR Inference (app/asr_inference.py)

4. Model Conversion (scripts/convert_to_onnx.py)

5. Docker Configuration (Dockerfile)

Screenshots

Screenshot from 2025-05-25 02-12-07 Screenshot from 2025-05-25 02-12-14

Features

Installation & Setup

Prerequisites

1. Local Development Setup

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

2. Model Setup

Option A: Download Pre-converted ONNX Model

mkdir -p models
# Download stt_hi_conformer_ctc_medium.onnx to models/
# https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_hi_conformer_ctc_medium

Option B: Convert from NeMo Model

  1. Download the NeMo model:
    mkdir -p downloads
    # Download stt_hi_conformer_ctc_medium.nemo to downloads/
    
  2. Convert to ONNX:
    python scripts/convert_to_onnx.py \
      --nemo-path downloads/stt_hi_conformer_ctc_medium.nemo \
      --output-dir models
    

3. Run the Application

Development Mode

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Production Mode (with Gunicorn)

gunicorn -k uvicorn.workers.UvicornWorker app.main:app --bind 0.0.0.0:8000 --workers 4

4. Docker Deployment

# Build the image
docker build -t hindi-asr-app .

# Run the container
docker run -d \
  --name asr-app \
  -p 8000:8000 \
  -v $(pwd)/models:/app/models \
  hindi-asr-app

API Documentation

1. Transcribe Audio

2. Health Check

Example Usage

cURL

curl -X POST "http://localhost:8000/transcribe" \
  -H "accept: application/json" \
  -F "file=@temp_audio/hindi_sample.wav"

Python Client

import requests

def transcribe_audio(file_path, server_url="http://localhost:8000"):
    with open(file_path, 'rb') as f:
        files = {'file': f}
        response = requests.post(f"{server_url}/transcribe", files=files)
        return response.json()

# Example usage
result = transcribe_audio("temp_audio/hindi_sample.wav")
print(f"Transcribed Text: {result['text']}")
print(f"Processing Time: {result['duration']:.2f} seconds")

Testing

Unit Tests

pytest tests/

Integration Test

python scripts/test_transcription.py \
  --input-dir sample_audio \
  --output results.json \
  --num-files 5

Troubleshooting

Deployment


For technical details, see Description.md.