Site Reliability Engineer
Infosys
Job Description
Observability: Implementing end-to-end monitoring solutions, implementing SLOs and SLIs for customer journeys, using industry tools like Datadog, Dynatrace, AppDynamics, etc. DevSecOps: Setting up CD pipelines using tools. Cloud Technologies: One of the major cloud technologies - AWS, GCP, or Azure – for key services – Compute, Storage, and Networking.
Infrastructure as Code: Solution design and implementation with industry tools like Terraform, Ansible, etc. Scripting and Automation: Scripting languages and automation tools. Preferred Skills: Develop observability solution implementations – monitoring, anomaly detection, alerting, and self-healing using industry tools like Datadog, Dynatrace, AppDynamics, New Relic, etc.
Support critical incident resolution in a complex environment – applications hosted on cloud or datacenters, containerized applications, databases, etc. Set up SLOs and SLIs using industry-leading tools. Play the role of an individual contributor and lead a small team in a global delivery model.
Develop Proof of Concepts (PoCs) and perform hands-on technical tasks based on client needs. Support responding to Requests for Proposal (RFPs) from clients. Analyze and identify improvement opportunities for automation and automate them.
Experience in Implementing AI/ML-based monitoring and self-healing solutions. Experience in Implementing Chaos Engineering/testing. Seniority level Mid-Senior level Employment type Full-time Job function Consulting, Analyst, and Engineering Industries Information Services, IT Services and IT Consulting, and Software Development #J-18808-Ljbffr