Data Engineer
Mphasis
Job Description
Key Responsibilities Technical Leadership & Ownership Own the end-to-end data engineering architecture for large-scale AWS data platforms Define and enforce data engineering standards, best practices, and governance frameworks Lead design reviews, code reviews, and technical decision-making across teams Act as the primary technical escalation point for complex data pipeline issues ETL/ELT Design & Development Design, build, and optimize scalable ETL/ELT pipelines using: AWS Glue (Jobs, Workflows, Crawlers) PySpark / Spark SQL, Snowflake, SnowsQL Python- based data processing frameworks Implement incremental processing, CDC, and data partitioning strategies Develop reusable and modular data pipeline frameworks for enterprise use Data Lake & Storage Management Design and manage data lake architecture on AWS (S3 + Apache Iceberg ) Implement ACID-compliant data layers using Iceberg O ptimize storage formats (Parquet, ORC) and data layouts for performance Define and enforce data lifecycle, retention, and archival policies Performance Optimization & Cost Efficiency Tune Spark/Glue jobs for performance optimization (memory, partitioning, caching) Optimize workloads for cost efficiency in AWS (compute, storage, I/O) Monitor and improve pipeline SLAs, throughput, and latency metric Data Governance & Quality Implement data quality frameworks, validations, and reconciliation checks Ensure compliance with data governance, lineage, and security standards Work with cataloging tools (AWS Glue Data Catalog, etc.) for metadata management Integration & Orchestration Design and manage end-to-end orchestration workflows (Glue Workflows, Step Functions, Airflow if applicable) Integrate data across multiple sources (RDBMS, APIs, streaming platforms, files) Enable reliable, fault-tolerant, and restartable pipeline execution Stakeholder Collaboration Partner with business, analytics, and AI teams to understand data requirements Collaborate with architects and DevOps teams for environment setup and automation Provide technical guidance to junior engineers and team members Team Leadership & Mentoring Lead and mentor a team of data engineers Drive skill development in Spark, AWS, and modern data architectures Ensure adherence to Agile practices and timely delivery of milestones Required Skills & Experience Core Technical Skills Strong experience in AWS Data Engineering stack: AWS Glue, S3, Lambda, IAM, CloudWatch Advanced proficiency in: PySpark / Apache Spark Spark SQL Python Hands-on experience with Apache Iceberg / modern table formats Deep understanding of ETL/ELT design patterns and data pipelines Data Engineering Expertise Experience with data lake and lakehouse architectures Strong knowledge of data modeling (star/snowflake schemas) Experience with batch and near real-time processing Familiarity with file formats (Parquet, ORC, Avro) Performance & Optimization Proven experience in large-scale data processing (TB/PB scale) Strong expertise in query optimization, partitioning, and indexing strategies DevOps & Automation Experience with CI/CD pipelines for data workflows Knowledge of infrastructure as code (CloudFormation/Terraform) is a plus Familiarity with version control (Git) and deployment strategies Preferred Skills (Good to Have) Experience with data orchestration tools (Airflow, Step Functions) Exposure to streaming frameworks (Kafka, Kinesis) Knowledge of data security (encryption, masking, access control) Experience supporting AI/ML data pipelines Exposure to BI tools (Power BI, Tableau, Sigma) Qualifications Bachelor’s/Master’s degree in Computer Science, Engineering, or related field 8–12+ years of experience in data engineering, with 3+ years in a technical leadership role