Data Engineer
EXL
Job Description
Agentic AI Data Engineer Role Overview Total Experience Required : 5-10 Years We are seeking a highly skilled Agentic AI Data Engineer to design, build, and optimize intelligent, autonomous data systems that power next-generation AI applications. This role blends data engineering, machine learning infrastructure, and emerging agent-based AI frameworks to enable scalable, self-orchestrating pipelines and decision-making systems. You will work at the intersection of data platforms, large language models (LLMs), and cloud-native architectures—building systems that can reason, act, and adapt autonomously.
Key Responsibilities Design and implement agentic AI systems that autonomously orchestrate data workflows and decision pipelines Build scalable data pipelines for structured and unstructured data (batch + real-time) Develop and manage LLM-powered applications using retrieval-augmented generation (RAG), tool use, and multi-agent frameworks Integrate AWS AI/ML services into production-grade architectures Develop and optimize data lakes, warehouses, and lakehouse architectures Build APIs and microservices to expose AI/ML capabilities Ensure data quality, governance, and security across pipelines Collaborate with data scientists, ML engineers, and product teams to deploy AI solutions Implement monitoring, logging, and observability for AI agents and pipelines Optimize cost and performance of cloud-based AI workloads Required Technical Skills Cloud & AWS Ecosystem Strong experience with AWS services, including: Amazon S3, Glue, Lambda, Step Functions Amazon Redshift / Athena Amazon SageMaker (training, deployment, pipelines) Amazon Bedrock (foundation models, agents, knowledge bases) AI/ML & Agentic Systems Experience with LLMs and generative AI systems Hands-on with agent frameworks (e.g., multi-agent orchestration, tool calling, planning systems) Familiarity with AgentCore / agent orchestration platforms Understanding of RAG architectures , embeddings, and vector databases Experience with model deployment, inference optimization, and prompt engineering Data Engineering Strong proficiency in Python and SQL Experience with ETL/ELT tools and frameworks Distributed data processing (Spark, PySpark, or similar) Streaming technologies (Kafka, Kinesis, or similar) Data modeling and schema design Data & AI Infrastructure Experience with vector databases (e.g., Pinecone, FAISS, OpenSearch) Knowledge of data lakehouse architectures (Delta Lake, Iceberg, Hudi) Containerization (Docker) and orchestration (Kubernetes) CI/CD for ML and data pipelines Preferred Qualifications Experience building autonomous AI agents for enterprise use cases Knowledge of multi-agent collaboration systems and planning algorithms Familiarity with LangChain, LlamaIndex, or similar frameworks Experience with MLOps and LLMOps practices Understanding of graph-based workflows and knowledge graphs Exposure to real-time AI systems and event-driven architectures Soft Skills Strong problem-solving and system design skills Ability to work in fast-paced, evolving AI environments Effective communication and cross-functional collaboration Curiosity and adaptability to emerging AI technologies Education & Experience Bachelor’s or Master’s degree in Computer Science, Engineering, or related field 4+ years of experience in data engineering or ML engineering Hands-on experience with production-grade AI/ML systems Nice-to-Have Experience with reinforcement learning or planning systems Background in distributed systems design Contributions to open-source AI/data projects Certifications in AWS (e.g., Solutions Architect, Machine Learning Specialty) What You’ll Build Autonomous data pipelines that self-heal and optimize AI agents capable of reasoning over enterprise data Scalable LLM-powered applications integrated with business workflows Intelligent systems that move beyond automation into decision-making