Senior Platform Engineer
InfoVision Inc.
Job Description
Job Description Critical Skills to Possess: Kubernetes & Container Orchestration β’ 3+ years of production Kubernetes experience; bare-metal / on-premises experience is mandatory β cloud-managed Kubernetes experience alone does not qualify β’ Hands-on Helm chart authoring (not just consumption), ArgoCD or equivalent GitOps tooling, and cert-manager β’ Deep understanding of Kubernetes control plane HA: etcd Raft quorum, leader election, minimum viable node counts β’ Experience with MetalLB and Traefik or equivalent bare-metal ingress and load balancing tools Messaging & Streaming β’ Production hands-on experience with Apache Kafka: topic configuration, partition sizing, replication factor, consumer group management, KRaft mode β’ Solid understanding of consumer lag monitoring and back-pressure patterns under burst load β’ Strong familiarity with MQTT protocol semantics: persistent sessions, QoS levels, TLS mutual authentication; hands-on experience with EMQX or a comparable broker at tens-of-thousands of concurrent connections Data Infrastructure β’ Production hands-on experience with PostgreSQL HA: Patroni-based automatic failover, PgBouncer connection pooling, streaming replication, PITR with pgBackRest or equivalent β’ Working knowledge of Valkey / Redis: cluster mode, TTL-based caching, atomic counter operations Observability β’ Hands-on deployment and configuration of VictoriaMetrics / Prometheus, Grafana, Loki, Tempo, and Alertmanager β’ Ability to write PromQL / MetricsQL queries and author production Grafana dashboards from scratch β’ Experience configuring alert deduplication and grouping in Alertmanager for high volume event storms Security & Secrets β’ Hands-on experience with HashiCorp Vault / OpenBao: PKI secrets engine, Kubernetes auth method, dynamic secrets β’ Experience with container image CVE scanning (Trivy or equivalent) and a self hosted container registry (Harbor or equivalent) β’ Understanding of TLS certificate lifecycle management: internal CA, automated rotation, mTLS between services CI/CD & GitOps β’ Experience building CI pipelines with real service dependency testing (testcontainers-go or equivalent) β’ GitOps workflow discipline: Git as the sole source of cluster state truth, PR-gated deployment approvals Programming β’ Go (Golang): able to read, debug, and modify existing microservice code β trace latency issues through service logic, adjust Kafka consumer configuration, update retry semantics; authoring new services from scratch is not required β’ Bash scripting for automation and operational runbooks β’ Strong proficiency with YAML / TOML for Kubernetes manifests, Helm values, and service configuration Networking β’ Understanding of TCP/TLS at scale: TLS handshake cost, session resumption overhead, connection state memory at 30,000 simultaneous connections β’ Working knowledge of data centre networking concepts (MLAG, VLAN, MTU, ToR design); able to collaborate effectively with a dedicated networking team Preferred Qualifications: BS degree in Computer Science or Engineering or equivalent experience