Principal Engineer - Platform Engineering & Production Support
Wells Fargo
Job Description
Title: Principal Engineer - Platform Engineering & Production Support Location: 401 W Las Colinas Blvd Irving, TX Alternate Locations: Charlotte, NC or Minneapolis, MN Duration: 12 months Work Engagement: W2 Work Schedule: 3 days in office/2 days remote Benefits on offer for this contract position: Health Insurance, Life insurance, 401K and Voluntary Benefits Summary: We are seeking a Principal Engineer within the Platform Engineering team. This individual must be Day 1 ready, comfortable operating in fast-paced, production-critical environments, and capable of balancing multiple competing priorities. The ideal candidate is a seasoned DevOps and Site Reliability Engineering (SRE) professional with strong hands-on expertise in observability, incident management, and cloud platforms (OpenShift). This role will play a leading part in supporting production systems, preventing outages, and improving system reliability through automation, intelligent monitoring, and modern SRE practices. Team Overview: This role supports a critical Platform Engineering team responsible for stabilizing, scaling, and operating applications as they move toward and beyond production release. The team plays a key role post-deployment, ensuring reliability, performance, and operational excellence across a broad application portfolio. This is not traditional infrastructure support. It is application-focused production engineering, requiring deep technical expertise, proactive issue prevention, and strong ownership of application health in cloud-native environments. Responsibilities: * Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution * Design, build, and maintain advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, Prometheus, and SPLOC * Proactively identify production risks through gap analysis, anomaly detection, and predictive alerting, preventing incidents before they occur * Troubleshoot complex production issues across distributed microservices environments, driving reduced MTTR through deep technical expertise * Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring * Support applications running on OpenShift and cloud-native platforms, with a strong focus on reliability, scalability, and resiliency * Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support * Participate in a 24x7 on-call rotation, demonstrating urgency, ownership, and accountability during incidents * Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering * Act as a trusted technical leader, able to rapidly shift priorities and manage competing demands in high-pressure environments Qualifications: * Applicants must be authorized to work for ANY employer in the U.S.
This position is not eligible for visa sponsorship. * Strong background in platform engineering and production support * Hands-on experience with: * Red Hat Linux * OpenShift and Kubernetes * Java and Python * Microservices architectures and Spring Boot * Experience designing and maintaining observability dashboards, including: * Grafana * Splunk * SPLOC * AppDynamics * Experience with observability alerts, incident response, and on-call support, leveraging tools such as: * AIOps platforms * ServiceNow * BigPanda or similar incident management tools * Experience with: * React.js * Apache * Kafka * Relational databases * Strong understanding of distributed systems, cloud-native platforms, and microservices-based architectures