⚡ New

Senior Platform Engineer

ICONSTAFF

CambridgeFull-timeMid LevelOn-site

Job Description

Job Description Job Description Senior Platform/Infrastructure Engineer Location: Fully remote (HQ Cambridge, MA) Hours: 9–5 EST, with 2-day on-site visits every 6 weeks You’ll be responsible for designing, scaling, and maintaining the infrastructure and internal developer platforms that power a real-time learning AI at a seed-stage startup. The role blends infrastructure ownership with platform engineering to enable AI/product teams to ship quickly and reliably. Key Responsibilities - Infrastructure Maintain production health: performance, reliability, cost efficiency, and security.

Manage GCP Kubernetes clusters (GKE), networking, storage, and compute resources. Handle scaling, resource allocation, and high availability for growing customer demand. Refine observability: logs, traces, metrics, dashboards, and alerts.

Perform security hardening and cost optimization. Kep Responsibilities - Platform Engineering Build internal tooling and abstractions for developer productivity. Design CI/CD pipelines using GitHub Workflows and ArgoCD.

Provide self-service environments, internal portals, and deployment systems. Collaboration & Communication Work closely with AI and full-stack teams to optimize system architecture. Explain technical concepts and trade-offs clearly to engineers and non-engineers.

Troubleshoot issues across multiple systems (Python, JavaScript, SQL). Requirements 5+ years in production cloud environments at scale. Strong familiarity with GCP (primary) and some AWS experience.

Experience with Kubernetes (GKE), node pools, and memory-intensive jobs. Working knowledge of CI/CD systems (GitHub Workflows + ArgoCD). Exposure to observability tools (Datadog), databases (Cloud SQL, ClickHouse, Bigtable), and cloud services.

Skills & Qualities Strong analytical and problem-solving ability. Clear, collaborative communication. Curiosity and ownership mentality.

Fluent in reading/debugging code across Python, JavaScript, SQL. Technical Stack Cloud: GCP (primary), AWS (secondary) Kubernetes: GKE, multiple node pools CI/CD: GitHub Workflows + ArgoCD Data: Cloud SQL, ClickHouse, Bigtable, GCS, Dataflow Networking: Cloudflare Workers, Durable Objects, WebSocket communication Monitoring: Datadog Environments: Production, Staging, Integration, Development

Posted 2 days ago

Related Jobs

Related Searches

Apply Now