ETL Developer - Pyspark
Vista Applied Solutions Group Inc
Job Description
We are looking for a skilled ETL Developer to support a strategic data migration initiative involving the transition from Informatica to PySpark. This is a highly collaborative role requiring strong technical expertise in PySpark and Python, along with excellent communication skills to work across departments for requirements gathering and solution implementation. Key Responsibilities Design, develop, and optimize PySpark data processing workflows for large-scale datasets.
Build and maintain real-time and batch data pipelines leveraging Apache Kafka . Write clean, efficient, and maintainable Python code for data transformation, ETL, and automation. Develop shell scripts and other automation scripts to support data workflows and operational tasks.
Work with relational databases (e.g., PostgreSQL, MySQL, SQL Server, Oracle) to write efficient SQL queries, manage schemas, and optimize performance. Collaborate closely with data engineers, analysts, and platform teams to deliver end‑to‑end data solutions. Troubleshoot and improve existing pipelines, ensuring performance, reliability, and scalability.
Follow best practices in version control, CI/CD, documentation, and code reviews. Required Qualifications Bachelor's degree in Computer Science or a related technical discipline is required 5+ years of hands‑on development experience in data engineering or backend engineering roles. Strong proficiency in: PySpark (RDDs, DataFrames, Spark SQL, performance tuning) Scripting (Bash, Shell, or similar) Experience working with relational databases and writing optimized SQL queries.
Solid understanding of distributed systems and large‑scale data processing concepts. Familiarity with ETL best practices , data modeling, and pipeline orchestration Basic understanding of LLM operations (LLMOps) , including prompt logging, model monitoring, and evaluation Knowledge of Spark cluster management , resource optimization, and tuning strategies is preferred. Experience with workflow orchestration tools such as Apache Airflow is a plus #J-18808-Ljbffr