Sr Machine Learning Engineer
The Walt Disney Company (Germany) GmbH
Job Description
Department Description
At Disney, weâre storytellers. We make the impossible, possible. The Walt Disney Company is a worldâclass entertainment and technological leader.
Waltâs passion was to continuously envision new ways to move audiences around the worldâa passion that remains our touchstone in an enterprise that stretches from theme parks, resorts and a cruise line to sports, news, movies and a variety of other businesses. Uniting each endeavor is a commitment to creating and delivering unforgettable experiences â and weâre constantly looking for new ways to enhance these exciting experiences.
The Enterprise Technology mission is to deliver technology solutions that align to business strategies while enabling enterprise efficiency and promoting crossâcompany collaborative innovation. Our group drives competitive advantage by enhancing our consumer experiences, enabling business growth, and advancing operational excellence.
Team Description
Reporting to the Director of Automation, Tooling, and Observability within Global Network Engineering & Operations (GNEO), the Machine Learning / Software Engineer plays a critical role in designing, developing, and implementing selfâhealing infrastructure management systems for enterpriseâwide, production environments. This role combines deep expertise in machine learning, AI technology, software engineering, and DevOps to create reusable patterns, frameworks, and services to improve reliability across Services and Platforms. The candidate will serve as a thought leader, identifying opportunities for and applying advanced analytics, predictive modeling, and AI to largeâscale telemetry, changes, events and incident data to derive actionable insights.
The role focuses on building, deploying, and operating machine learning models that proactively detect issues, predict failures, and drive automated, selfâhealing remediation across enterprise systems. The role is intentionally machine learning and AI heavy and is intended to be a strategic driver in that space.
What Youâll Do
- Work alongside our firstâclass applications, infrastructure & operations teams to understand current manual processes and business requirements.
- Architect, design, and implement reusable machine learning frameworks, patterns, and services that integrate into the enterprise automation and observability platforms.
- Design, train, and deploy machine learning models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more in distributed environments that can be used to surface leading indicators of failure.
- Build nearârealâtime inference pipelines that generate actionable insights from live telemetry, including continuous streams of metrics, logs, traces, and operational events.
- Create data abstractions and perform feature engineering on highâvolume, highâcardinality telemetry data.
- Evaluate model performance using real production signals and continuously iterate to improve accuracy and reliability.
- Build closedâloop, eventâdriven systems where model signals trigger automated remediation actions.
- Partner with infrastructure and SRE teams to identify opportunities and integrate machine learning and AIâdriven insights into operational tools, workflows, and dashboards.
- Analyze incident and historical data to uncover leading indicators and predictive signals.
- Own the full machine learning lifecycle: experimentation, validation, deployment, monitoring, and retraining.
- Break down targeted, manual processes into reusable software modules that leverage machine learning models.
- Build emulation and simulation environments (digital twins) of the infrastructure to test AI/MLâdriven automation under realistic scenarios and allow for faster ideation and iteration for architects and engineers.
- Develop algorithms and frameworks to integrate machine learning and AI technologies into our orchestration platform.
- Ensure service reliability, performance, and operational uptime through codeâdriven solutions.
- Conduct root cause analysis, design faultâtolerant architectures, and enable selfâhealing automation.
- Implement monitoring dashboards and KPIs to provide visibility into automation and tooling performance.
- Collaborate with crossâfunctional teams including network engineers, software developers, machine learning engineers, and operations teams across the enterprise.
- Support the integration of commercial and openâsource tools while maintaining a vendorâagnostic implementation.
Required Qualifications & Skills
- 7+ years of software engineering experience, with expertise in automation, machine learning, and AI technologies.
- Proven handsâon experience building productionâgrade ML models and inference pipelines; strong proficiency with modern ML frameworks such as PyTorch, TensorFlow, Scikitâlearn, etc.
- Design, train, and deploy machine learning models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more in distributed environments that can be used to surface leading indicators of failure.
- Proven handsâon experience using software to build frontend, APIs and backend functionality; strong proficiency with Python, JavaScript, TypeScript, Go, or Rust.
- Build emulation and simulation environments (digital twins) of the infrastructure to test AI/MLâdriven automation under realistic scenarios and allow for faster ideation and iteration for architects and engineers.
- Strong handsâon experience building and deploying eventâdriven or streaming data, machine learning models in production.
- Solid foundation in statistics, data analysis, and applied machine learning techniques.
- Experience working with largeâscale, realâworld datasets (noisy, incomplete, nonâstandardized, and evolving).
- Experience operationalizing models in distributed, production environments.
- Ability to translate ambiguous operational problems into solvable machine learning use cases.
- Experience with modern cloud platforms, container orchestration (Kubernetes/Docker), identity/auth frameworks, data and workflow orchestration.
- Experience with AI/ML technologies and data engineering concepts. Preferred: Proven handsâon building AI agents.
- Demonstrated success designing and building enterpriseâscale systems and reusable software frameworks.
- Strong communication, collaboration and leadership skills.
- Applies systems thinking to understand how individual components fit into larger, more holistic solutions.
- Capable of quickly shifting between detailed, handsâon work and highâlevel strategic thinking.
Preferred Qualifications
- Certifications such as Kubernetes (CKA/CKAD), AWS/Azure/GCP certifications, CCNP/DevNet or NVIDIA AI engineer.
- Experience developing lowâcode/noâcode automation platforms or reusable developer toolkits.
- Contributions to openâsource automation, machine learning, AI, observability, or DevOps communities.
- Applying unsupervised and semiâsupervised learning for anomaly detection and signal discovery.
- Applying complex event processing and event correlation techniques.
- Building timeâseries forecasting models for capacity, latency and failure prediction.
- Experience with feature stores, offline/online feature pipelines, and feature reuse.
- Implementing model monitoring for drift, bias, and performance degradation.
- Experience with reinforcement learning or decision models for automated remediation and optimization.
- Working with realâtime or nearârealâtime inference pipelines.
- Experience labeling, curating, and managing training data derived from production telemetry.
- Experience mentoring engineers, sharing knowledge, and fostering a learning culture.
- Demonstrated curiosity and continuous learning mindset, with a passion for exploring emerging AI/ML, automation, and platform technologies.
Required Education
- Bachelorâs degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or comparable field of study, and/or equivalent work experience.
Preferred Education
- Masterâs degree in Computer Science, Engineering, or related discipline.
The Walt Disney Company and its Affiliated Companies are Equal Employment Opportunity employers and welcome all job seekers including individuals with disabilities and veterans with disabilities. If you have a disability and believe you need a reasonable accommodation in order to search for a job opening or apply for a position, visit the Disney candidate disability accommodations FAQs. We will only respond to those requests that are related to the accessibility of the online application system due to a disability.
#J-18808-Ljbffr