⚡ New

MLOps Engineer

Accellor

HyderabadFull-timeMid LevelOn-site

Job Description

We are seeking a Senior MLOps Engineer to design, build, and maintain the infrastructure and pipelines that operationalize AI and Machine Learning systems at scale. This role bridges the gap between model development and production deployment—ensuring ML and GenAI workloads are reliable, observable, cost-efficient, and continuously improving across enterprise environments. Key Responsibilities: Design and implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, evaluation, and deployment.

Build and manage CI/CD pipelines for ML models, including automated testing, validation, and rollback mechanisms. Architect and maintain model serving infrastructure for real-time and batch inference workloads, including LLM and agentic AI deployments. Implement model monitoring, drift detection, and alerting systems to ensure production model health and reliability.

Manage experiment tracking, model versioning, and artifact registries to enable reproducibility and governance. Optimize compute costs and inference latency across GPU/CPU workloads on cloud platforms (AWS, Azure, or GCP). Containerize and orchestrate ML workloads using Docker and Kubernetes.

Automate data pipeline workflows and feature store management for training and inference. Collaborate with AI Engineers, Data Scientists, and Platform teams to streamline the path from prototype to production. Establish and enforce MLOps best practices, standards, and documentation across the engineering organization.

Requirements: Bachelor’s degree in Computer Science, Engineering, or a related field. 5+ years of experience in DevOps, Platform Engineering, or MLOps roles with 1–2+ years focused on ML/AI infrastructure. Strong programming skills in Python; experience with Bash, Go, or Java is a plus. Hands-on experience with ML pipeline orchestration tools such as Kubeflow, MLflow, Airflow, or Vertex AI Pipelines.

Proficiency with containerization (Docker) and orchestration (Kubernetes, Helm). Experience with cloud-native ML services on AWS (SageMaker), Azure (Azure ML), or GCP (Vertex AI). Familiarity with model serving frameworks such as TorchServe, Triton Inference Server, vLLM, or TGI.

Knowledge of Infrastructure as Code (Terraform, Pulumi, or CloudFormation). Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or equivalent). Strong understanding of software engineering fundamentals, version control (Git), and CI/CD practices.

Nice to Have: Experience deploying and serving Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems in production. Familiarity with vector databases (Pinecone, Weaviate, Qdrant, or pgvector). Exposure to AI observability platforms (LangSmith, Weights & Biases, Arize, or WhyLabs).

Experience with feature stores (Feast, Tecton, or equivalent). Familiarity with GPU cluster management and distributed training infrastructure. Experience with enterprise SaaS platforms and multi-tenant ML infrastructure.

Posted 2 days ago

Related Jobs

Openshift Engineer

Simple Logic IT Private Limited

Mumbai Today 2 views
Full-time On-site Mid Level

Related Searches

Apply Now