Senior Staff Engineer, Software

Celestica Inc.

RichardsonFull-timeMid LevelOn-site

Job Description

Senior Staff Engineer, Software Date: May 25, 2026 General Overview Functional Area: Engineering Career Stream: Design - Software Engineering Job Code: SSE-ENG-DSE Job Level: Level 11 IC/MGR: Individual Contributor Direct/Indirect Indicator: Indirect Summary This is a high-impact, hands‑on technical leadership role where you will architect and build systems that enable deployment, monitoring, and optimization of large‑scale infrastructure supporting AI workloads across modern data center environments. You will operate at the intersection of: Infrastructure management, monitoring, and diagnostics This role requires deep technical expertise along with the ability to drive end‑to‑end solutions from architecture through deployment and troubleshooting. Detailed Description Lead the architecture, design, and development of scalable AI infrastructure platforms supporting GPU‑based data center environments Build and enhance orchestration systems responsible for infrastructure deployment, provisioning, monitoring, and lifecycle management Design distributed systems with a focus on scalability, resiliency, fault tolerance, concurrency, and performance optimization Develop infrastructure observability and diagnostics capabilities across GPU, networking, and storage environments Define telemetry, health monitoring, and performance validation strategies for large‑scale AI infrastructure deployments Develop and support data center networking and orchestration workflows including ZTP, DHCP, provisioning, and automated infrastructure configuration Work across modern AI fabric and data center networking architectures including Clos fabrics, EVPN, and L2/L3 networking environments Write high‑performance backend software and infrastructure services using Python or Go within Kubernetes‑based environments Troubleshoot and resolve complex infrastructure, networking, orchestration, and performance issues in live production data center environments Lead root cause analysis efforts and drive issues through resolution across software, networking, and infrastructure layers Partner cross‑functionally with engineering, hardware, platform, lab, and customer teams to support deployments and operational success Drive technical direction, architecture decisions, engineering best practices, and mentorship across the organization Translate real‑world deployment challenges into scalable engineering solutions that improve reliability, automation, and operational efficiency Operate as a hands‑on technical leader capable of driving initiatives from architecture and development through deployment and production support Required 12+ years of experience in software engineering focused on infrastructure, distributed systems, networking, or large‑scale platform development Strong expertise in data center networking fundamentals including: L2/L3 networking BGP and EVPN Clos fabrics and AI networking architectures Proven experience designing and building scalable distributed systems in production environments Hands‑on experience with infrastructure orchestration, provisioning, and large‑scale data center deployments Strong programming experience in Python or Go Experience building systems within Kubernetes‑based environments Strong understanding of system scalability, concurrency, resiliency, and performance optimization Demonstrated ability to troubleshoot and debug complex multi‑layer production systems Strong communication and collaboration skills with the ability to work across technical and non‑technical teams Preferred Experience with AI/ML infrastructure, GPU clusters, or high‑performance computing (HPC) environments Experience with AI infrastructure monitoring, observability, and diagnostics platforms Familiarity with AI workload orchestration and scheduling systems Experience with infrastructure automation tools such as Ansible Experience supporting customer deployments and external stakeholder engagements Background supporting large‑scale data center or cloud infrastructure platforms Typical Experience 12+ Years Typical Education Bachelor degree or consideration of an equivalent combination of education and experience. Educational Requirements may vary by Geography Notes This job description is not intended to be an exhaustive list of all duties and responsibilities of the position. Employees are held accountable for all duties of the job. Job duties and the % of time identified for any function are subject to change at any time. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. Celestica's policy on equal employment opportunity prohibits discrimination based on race, color, creed, religion, national origin, gender, sexual orientation, gender identity, age, marital status, veteran or disability status, or other characteristics protected by law. This policy applies to hiring, promotion, discharge, pay, fringe benefits, job training, classification, referral and other aspects and also states that retaliation against a person who files a charge of discrimination, participates in a discrimination proceeding, or otherwise opposes an unlawful employment practice will not be tolerated.

All information will be kept confidential according to EEO guidelines. Nearest Major Market: Dallas Nearest Secondary Market: Fort Worth Job Segment: Facilities, Cloud, Software Engineer, Aerospace Engineering, Manufacturing Engineer, Operations, Technology, Engineering #J-18808-Ljbffr

Posted 3 weeks ago

Related Jobs

Related Searches

Apply Now