⚡ New

Staff / Principal Engineer – Core Engineering

Socotra, Inc.

San FranciscoFull-timeMid LevelOn-site

Job Description

Build the Future of Scalable AI at TrueFoundry At TrueFoundry , we’re redefining how ML teams train, deploy, and scale their models. Our LLMOps and MLOps platform empowers organizations to experiment faster, train large-scale models reliably, and deploy them seamlessly on Kubernetes—with the same muscle as Big Tech. We're looking for an Engineer who is passionate about scaling deep learning workloads, optimizing multi-GPU training, and shipping production-grade solutions. The Role: We are seeking a Staff / Principal Engineer to join our Core Engineering team as a senior technical leader based in the United States. You will: Solve some of the most complex Engineering problems and drive it alongside a team of engineers & ML researchers. Build a deep, holistic understanding of the TrueFoundry platform across all components and shape the product vision and implementation. Act as the technical face of engineering for customer-related discussions and escalations Guide and unblock engineers across projects in the US region Partner closely with our CTO and India-based engineering team to drive system design, architecture, and implementation of complex products Lead technical design , critical customer problem-solving , and platform scalability initiatives end-to-end This is a high-ownership , high-impact role designed for an engineer who loves combining world-class systems thinking with real-world execution . What You’ll Do: Develop deep expertise across TrueFoundry’s platform stack — infrastructure, deployment systems, LLM/ML orchestration, observability, cost optimization, and more Drive the system architecture and design for complex, distributed, cloud-native systems Act as the technical point-of-contact for enterprise customer engineering needs and escalations Lead and participate in design reviews, code reviews, and critical incident responses Collaborate closely with the CTO on architectural decisions, scaling strategies, and technical roadmap prioritization Guide and mentor US-based engineers across multiple initiatives, helping them deliver high-quality, scalable systems Identify and drive technical debt cleanup , performance improvements , and resilience upgrades across the platform Bring a product engineering mindset , ensuring that customer needs and feedback translate into scalable engineering solutions Who You Are: 8+ years of strong backend / systems engineering experience at top technology companies or startups Deep expertise in distributed systems , cloud-native architectures , and scalable system design Strong working knowledge of Kubernetes , containerized workloads , and infrastructure engineering Practical experience building or deploying ML/GenAI applications (or closely working with ML/DS teams) Skilled in programming languages such as Python , Go , or typescript Solid understanding of system observability , resiliency design , and SRE practices Strong technical leadership and communication skills — able to work with both customers and engineering teams Ability to think strategically while also executing hands‑on when required Bonus: Experience supporting enterprise deployments of AI/ML infrastructure , model training , or inference systems Why Join TrueFoundry? Work directly with ex‑Facebook engineers and founders from IIT Kharagpur, UC Berkeley, and Y Combinator alumni .

First‑hand exposure to building and scaling a deep‑tech startup —insights you’ll carry if you want to start your own one day. Be part of a fearlessly experimental culture focused on customer success and long‑term impact. Flexible hours, learning credits, and the opportunity to work shoulder‑to‑shoulder with the co‑founders (Abhishek & Nikunj). #J-18808-Ljbffr

Posted Yesterday

Related Jobs

Related Searches

Apply Now