Al Engineer
HabileLabs
Job Description
AI Engineer — Voice & Language 4+ Years · Full-Time· Jaipur ·Competitive Salary About the Role We're hiring a senior AI Engineer to design, build, and ship production AI systems — with strong emphasis on Voice AI. You'll own the full lifecycle: architecture, training, deployment, and monitoring across language and voice modalities. What You'll Do LLM & GenAI: Fine-tune and deploy LLMs; build RAG pipelines and agentic workflows (LangChain, LlamaIndex). Voice Pipelines: Architect real-time ASR → LLM → TTS pipelines with Voice Agents: Build production voice agents with turn-taking, barge-in handling, and emotion-aware dialogue. Speech Fine-Tuning: Adapt ASR/TTS models for domain-specific accents, terminology, and speaking styles. MLOps: Build reproducible ML pipelines (Kubeflow / MLflow); maintain CI/CD, monitoring, and model versioning. Inference Optimization: Apply quantization (GGUF, GPTQ), distillation, and hardware-aware inference (TensorRT, vLLM) to cut cost and latency. APIs & Services: Ship high-performance inference APIs in Python (FastAPI) or Go on Kubernetes. Data & Evaluation: Curate text + speech corpora; define eval harnesses covering WER, MOS, latency P95, and safety.
Requirements Must-Have 4+ yrs ML/software engineering; 2+ yrs on production AI systems Strong Python; PyTorch or TensorFlow LLM fine-tuning: LoRA / QLoRA / PEFT End-to-end ML pipeline experience (train → serve) Cloud (AWS / GCP / Azure) + Docker / Kubernetes ASR & TTS integration in real-time streaming systems VAD, noise suppression, and barge-in handling Telephony APIs (Twilio, Vonage) or WebRTC experience Nice-to-Have Whisper / wav2vec fine-tuning for domain adaptation Audio-language models (AudioPaLM, Qwen-Audio, Gemini Audio) Speaker diarization (pyannote.audio) or voice biometrics Prosody control, SSML, expressive TTS synthesis Multilingual ASR/TTS and code-switching pipelines RLHF / Constitutional AI alignment Vector DBs (Pinecone, Weaviate, pgvector) Open-source contributions or published research Tech Stack Core Python PyTorch FastAPI / Go Kubernetes MLflow LLM & GenAI OpenAI / HuggingFace LangChain LlamaIndex vLLM RAG / Agents ️ Voice AI STT TTS WebRTC / WebSockets pyannote.audio Twilio / Vonage ️ Audio Processing librosa / FFmpeg Silero VAD openWakeWord SSML / Prosody AEC / Noise Suppression