math-homework-helper

Math Agent with Human-in-the-Loop Feedback

A virtual math professor that provides step-by-step solutions to mathematical problems using an Agentic RAG (Retrieval-Augmented Generation) architecture, with human feedback for continuous improvement.

Deployed Agent at https://bcwti9cmjtjgovljrdxxer.streamlit.app/

Documentation at https://sarthakb11.github.io/math-homework-helper/

🚀 Features

AI Gateway with Guardrails: Ensures all interactions are focused on mathematics, maintaining safety and privacy.
Knowledge Base Integration: Uses a vector database (Qdrant) to store and retrieve mathematical knowledge.
Web Search Capability: Falls back to web search when the knowledge base lacks information.
Step-by-Step Solution Generation: Provides clear, easy-to-understand solution steps.
Human-in-the-Loop Feedback: Learns from user feedback to improve future responses.
Interactive Web Interface: Simple, user-friendly interface using Streamlit.

🧰 Tech Stack

Backend: Python (LangGraph, LangChain)
Frontend: Streamlit
Vector Database: Qdrant
LLM: Google AI (Gemini-Pro)
Search API: Tavily/Serper
Embedding: Sentence Transformers
HITL Framework: DSPy-ai
Other: FastAPI, MongoDB (via pymongo), dotenv

🛠️ Architecture

The system uses an Agentic RAG architecture with the following components:

AI Gateway: Entry/exit point, enforcing guardrails.
Routing Agent: Directs queries to either the knowledge base or web search.
Knowledge Base: Vector database (Qdrant) storing math knowledge.
Web Search Agent: Performs targeted web searches and extracts content.
Generation Agent: Synthesizes information into step-by-step solutions.
Human Feedback Loop: Collects and integrates user feedback.

📁 Project Structure

math-homework-helper/
├── app.py                  # Main Streamlit app entry point
├── requirements.txt        # Python dependencies
├── env.sample              # Example environment variables
├── .env                    # (Not committed) Your actual environment variables
├── app/                    # Core application modules (agents, kb, feedback, etc.)
├── scripts/
│   ├── init_db.py          # Script to initialize the vector DB
│   └── load_knowledge_base.py # Script to load sample data
├── .github/workflows/
│   └── deploy.yml          # GitHub Actions workflow for deployment
└── ...

🔧 Installation & Local Development

Prerequisites

Python 3.8+
pip
Docker (for Qdrant)
Git

Setup

Clone the repository:

git clone https://github.com/your-username/math-homework-helper.git
cd math-homework-helper

Set up a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Configure environment variables:
```
cp env.sample .env
```
Edit .env with your API keys and configuration.

Start Qdrant (Vector Database) using Docker:

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

Initialize the database:
```
python scripts/init_db.py
```
Load sample data into the knowledge base:
```
python scripts/load_knowledge_base.py
```
Run the Streamlit app:
```
streamlit run app.py
```

🚀 Deployment (GitHub Actions)

A ready-to-use GitHub Actions workflow is provided at .github/workflows/deploy.yml. On push to main, it will:

Set up Python and dependencies
Start Qdrant in Docker
Run your Streamlit app

To use Qdrant Cloud, set VECTOR_DB_URL and VECTOR_DB_API_KEY in your .env or as GitHub Actions secrets.

Required Secrets

Set these secrets in your GitHub repository (Settings → Secrets and variables → Actions):

LLM_API_KEY
SEARCH_API_KEY
VECTOR_DB_URL
VECTOR_DB_PORT
VECTOR_DB_COLLECTION
DB_CONNECTION_STRING
DEBUG
LOG_LEVEL

(Reference your .env.sample for any additional secrets your app may require.)

📝 Environment Variables

See env.sample for all required environment variables:

LLM_API_KEY
VECTOR_DB_URL
VECTOR_DB_PORT
VECTOR_DB_COLLECTION
SEARCH_API_KEY
DB_CONNECTION_STRING
DEBUG
LOG_LEVEL

🧪 Testing

To run tests:

pytest

📁 Project Structure

maths-homework-helper/
├── app/                  # Core application code
│   ├── agents/           # Agent definitions and logic
│   │   ├── generation_agent.py  # LLM-based solution generation
│   │   └── routing_agent.py     # KB/Web routing logic
│   ├── gateway/          # AI Gateway implementation
│   │   └── ai_gateway.py        # Input/output validation
│   ├── kb/               # Knowledge Base integration
│   │   └── vector_db.py         # Vector database connector
│   ├── web_search/       # Web Search and extraction logic
│   │   └── search_agent.py      # Web search and content extraction
│   ├── feedback/         # Human-in-the-Loop feedback mechanism
│   │   └── feedback_loop.py     # Feedback collection and processing
│   └── models/           # Data models and schemas
│       └── database.py           # Database models
├── scripts/              # Utility scripts
│   ├── init_db.py                # Initialize database tables
│   └── load_knowledge_base.py    # Load sample data into KB
├── data/                 # Knowledge Base data and schemas
│   ├── kb_data/                  # Custom KB data files
│   └── feedback_logs/            # Feedback logs
├── Instructions/         # Project documentation
├── .env                  # Environment variables (not versioned)
├── env.sample            # Sample environment variables
├── requirements.txt      # Python dependencies
└── app.py                # Main application entry point

🔄 Workflow

User submits a math question through the UI
AI Gateway validates the input
Routing Agent checks the Knowledge Base for relevant information
If KB has good matches, the solution is generated from KB content
If KB lacks information, Web Search is performed to find solutions
Generation Agent creates a step-by-step solution
User sees the solution and can provide feedback (helpful/needs improvement)
Feedback is logged and used to improve future responses

🤝 Contributing

We welcome contributions! Please see the development workflow in the documentation folder.

📝 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Built using LangChain and LangGraph frameworks
Uses Qdrant for vector storage
Powered by Google AI’s Gemini models