โšก New

Data Engineer

Alegeus

BengaluruFull-timeMid LevelOn-site

Job Description

Role summary We are looking for an Expert Software Engineer to design, build, and scale our next-generation Data Platform and Data-Driven APIs. This role combines distributed data processing (Apache Spark) with platform and microservices engineering (Java) to enable reliable, scalable, and real-time data access. You will operate at the intersection of data engineering and backend platform engineering-building systems that not only process large volumes of data but also expose that data through robust, well-designed APIs and services.

This role goes beyond implementing requirements. We expect engineers to understand business context, challenge assumptions, and take end-to-end ownership of delivering meaningful outcomes. Key responsibilities Data Platform Engineering Design and develop scalable data pipelines using Apache Spark (batch and streaming) Build and maintain data platform layers: ingestion, transformation, and serving Optimize Spark jobs for performance, cost, and reliability (partitioning, skew handling, memory tuning) Implement data quality, observability, and lineage frameworks Contribute to data architecture decisions (Lakehouse, data mesh, storage formats, partition strategies) Define and enforce data contracts and schema evolution practices Platform APIs & Backend Engineering Design and build data-driven platform APIs using Java (preferred) Develop microservices that expose curated datasets for product and partner consumption Implement RESTful APIs and event-driven services for real-time and near real-time data access Ensure low-latency, high-availability data serving layers Integrate with upstream/downstream systems, including legacy APIs where required Cloud & Platform Integration Build and deploy solutions on Azure (preferred) / AWS / GCP Leverage cloud-native services for data storage, compute, and messaging Work with event streaming systems (Kafka/Event Hubs) for real-time pipelines Support containerized deployments and orchestration (Kubernetes) where applicable Quality, Observability & Engineering Excellence Champion unit tests across both data and service layers Build automated validation frameworks for data pipelines Implement end-to-end observability (metrics, logging, tracing) across pipelines and APIs Drive CI/CD practices for both data and application code Conduct code reviews and enforce engineering best practices Product Mindset & Ownership Engage deeply with product and business stakeholders to understand why, not just what Translate business problems into scalable data and platform solutions Take end-to-end ownership from design through production and support Proactively identify performance bottlenecks, data issues, and system gaps Required qualifications (Hard requirements) 8+ years of software engineering experience with strong focus on data platforms and/or distributed systems Hands-on expertise in Apache Spark or Scala or PySpark Strong programming skills in Java (preferred) / Scala / Python Experience building large-scale data pipelines (ETL/ELT) Experience developing backend services or APIs (REST/microservices) Deep understanding of: Distributed systems (partitioning, shuffle, fault tolerance) Data storage formats (Parquet, ORC, Avro) Data modeling and schema evolution Experience with cloud platforms (Azure/AWS/GCP) Familiarity with workflow orchestration tools (Airflow, Dagster, etc.) Strong system design and performance optimization skills Preferred qualifications Experience with Spark Structured Streaming Exposure to Lakehouse architectures (Delta Lake, Iceberg, Hudi) Experience with event-driven architectures (Kafka, Event Hubs) Knowledge of data governance, catalog, and lineage tools Experience with CI/CD for data and microservices Familiarity with Kubernetes and containerized workloads Experience designing low-latency data serving APIs What success looks like A successful engineer in this role will: Deliver high-quality, production-grade data pipelines and APIs that power real business outcomes Build systems that are scalable, observable, and resilient underload Take ownership end-to-end, ensuring data flows reliably from source to consumer Balance data correctness, performance, and cost efficiency Contribute to evolving a modern data platform integrated with product-facing services

Posted Today

Related Jobs

Related Searches

Apply Now