Data Engineer
Gazelle Global
Job Description
We are Hiring for an experienced Data Engineer to join our team in Sheffield, United Kingdom. The ideal candidate will have Strong Experience in Scala and/or Python for streaming apps, familiarity with testing frameworks and CI for stream processors. • Collaboration: Work closely with Kafka platform/admin teams while focusing on application-layer streaming logic; strong communication and documentation. Your responsibilities: Design and build Kafka-based streaming applications (Kafka Streams/ksqlDB) in Scala/Python for transformation, enrichment, and routing.
Implement end-to-end streaming pipelines: producers, stream processors, and consumers with strong data quality, idempotency, and DLQ patterns. Model topics, schemas, and contracts (Avro/Protobuf/JSON) and maintain backward/forward compatibility. Develop batch/stream interoperability: Spark/Structured Streaming jobs for aggregation, feature generation, and storage in Parquet/ORC.
Integrate processed data into analytics/observability platforms (e.g., Splunk) for dashboards, alerting, and proactive insights. Build automated validation, replay, and backfill mechanisms to ensure reliability and SLA adherence. Apply observability to the pipelines themselves (metrics, traces, structured logs) and tune performance/cost.
Collaborate with platform/infra teams who handle Kafka admin (brokers, security, ops) while owning application-side streaming logic. Ensure security and compliance for application data paths (authn/z, encryption in transit/at rest, secret management). Document data flows, schemas, and runbooks for streaming services.
Your Profile Essential skills/knowledge/experience: Kafka application development: Kafka Streams/ksqlDB, producer/consumer patterns, partitioning/serialization, exactly-once/at-least-once semantics. Languages: Strong in Scala and/or Python for streaming apps; familiarity with testing frameworks and CI for stream processors. Schema management: Avro/Protobuf/JSON, schema registry usage, compatibility strategies.
Stream/batch processing: Spark (including Structured Streaming), Parquet/ORC, partitioning/bucketing, performance tuning. Data quality and reliability: Idempotent processing, DLQs, replay/backfill, lineage, and SLA-aware designs. Observability: Metrics/tracing/logging for stream apps; integration with downstream dashboards/alerts.
Security/compliance: AuthN/Z in clients, TLS/SASL usage, secret management in code/services. Collaboration: Work closely with Kafka platform/admin teams while focusing on application-layer streaming logic; strong communication and documentation. #J-18808-Ljbffr