AI Engineer / Cloud Engineer
TryApplyNow
Job Description
# AI Engineer / Cloud EngineerSageBeans RPOFull TimemidHybridToronto, Ontario, CAPosted Yesterday## Role OverviewSageBeans RPO is hiring a mid-level AI Engineer / Cloud Engineer. This is a full-time hybrid role, based in Toronto. Part of SageBeans RPO's Security hiring, posted yesterday.
Full responsibilities, required qualifications, and the apply link are listed in the description below.## Salary ContextSalary is not disclosed in this posting. Market median for Mid-level Security roles is $87k-$125k (based on 288 comparable listings). Many employers share specifics during the interview process or after an initial screen.## Resume Keywords to IncludeMake sure these keywords appear in your resume to improve ATS scoringPythonTypeScriptNode.jsAWSAzureGitHub ActionsGitHubRESTSign up free to auto-tailor your resume with all these keywords and get a higher ATS score## Job DescriptionAI Engineer / Cloud EngineerCore Stack: AWS Bedrock AgentCore* Azure AI Foundry* MCP GovernanceFunction: Foundations / Infrastructure & OperationsLevel: Senior / StaffType: Full-timeLocation: Remote / HybridReports to: Director, AI Architecture### About the RoleThe Foundations team serves as the enterprise AI governance control plane for Infrastructure & Operations (I&O), responsible for the infrastructure, observability, security, and policy layer across a multi-cloud AI agent ecosystem.We are seeking a senior AI/Cloud Engineer to design, build, and operate production-grade AI agent infrastructure across Amazon Web Services Bedrock AgentCore and Microsoft AI Foundry, with deep integration across MCP (Model Context Protocol) connectors, LLM gateways, and enterprise data systems.This role sits at the intersection of AI platform engineering, cloud infrastructure, governance, and enterprise security.
You will partner closely with the I&O Architecture Engineering team to ensure AI agents are observable, measurable, secure, and fully governed across the enterprise estate.Key Responsibilities1. Agent Infrastructure & Platform EngineeringDesign and deploy production AI agent workloads on AWS Bedrock AgentCore, including runtime configuration, memory stores, and DataDog observability instrumentation. Build and maintain Azure AI Foundry agent pipelines with Application Insights telemetry, Azure APIM-based token attribution, and Azure Monitor integration for safety and red-teaming signals.
Architect MCP connector infrastructure, including tool-call routing, RBAC enforcement, OAuth2 / Entra ID scoping, and end-to-end audit logging. Maintain and evolve the enterprise LLM gateway as the centralized routing, policy enforcement, and instrumentation layer across Bedrock, Azure OpenAI, and Claude-based endpoints.2. Governance, Security & ObservabilityInstrument agent systems to capture Tier 1 audit KPIs such as tool-call completeness, policy violations, RBAC coverage, and authentication failure rates aligned with compliance requirements.
Define per-connector security policies ensuring Finance, HR, Legal, and Client data systems remain fully governed and least-privileged. Build unified observability dashboards across CloudWatch, Azure Monitor, and DataDog for AI system health and executive reporting. Participate in CDR (Critical Design Review) processes for all AI agent deployments, ensuring adherence to reliability, security, and observability standards.
Design SOC 2-aligned audit log pipelines for all agent tool-calls to support compliance and forensic traceability.3. Integration & Platform InteroperabilityIntegrate AI agent systems with enterprise platforms such as ServiceNow, DataDog, Apptio/Cloudability, and internal data platforms via MCP connectors and REST APIs. Support CI/CD automation for AI agents using GitHub Actions, including environment promotion, rollback strategies, and pipeline replay mechanisms.4.
AI Quality & EvaluationDefine and instrument agent KPIs including task completion rate, hallucinated tool-call detection, escalation rate, and context efficiency metrics. Leverage AWS AgentCore evaluation frameworks and Azure AI Foundry evaluation tooling to assess groundedness, safety, tool accuracy, and reliability. Build golden dataset regression suites to detect performance degradation across model updates, prompt changes, and connector modifications.Required QualificationsPlatform Experience (Must Have)AWS Bedrock AgentCore (production workloads) Azure AI Foundry (agent pipelines & evaluation systems) AWS Lambda / EKS Azure API Management (APIM) CloudWatch (metrics, logs, traces) Azure Monitor + Application Insights LLM Gateway architecture experience MCP (Model Context Protocol) servers OAuth2 / Entra ID / IAM-based credential scopingEngineering Skills5+ years software/platform engineering experience; 2+ years in AI/ML infrastructure or LLM-based systems Strong proficiency in Python; TypeScript/Node.js preferred for orchestration layers Experience with REST APIs, async processing, and event-driven architectures Hands-on CI/CD experience (GitHub Actions preferred) Infrastructure-as-Code and container-based deployment experience Strong observability engineering experience (metrics, logs, tracing in production systems) Security fundamentals: RBAC, least privilege, secrets management (AWS Secrets Manager / Azure Key Vault), audit loggingAI & Agent EngineeringStrong understanding of LLM systems: token management, context windows, prompt engineering, RAG, and model drift Experience building multi-step agentic workflows with tool/function calling Familiarity with AI evaluation frameworks (accuracy, hallucination rate, groundedness, tool selection quality) MCP protocol understanding: lifecycle, tool invocation, session management, and permission scoping### Preferred QualificationsProduction experience with Claude Enterprise / Anthropic APIs (Claude Sonnet 4+) Experience with ServiceNow CMDB or enterprise data catalog systems (LeanIX or similar) Contributions to open-source AI agent frameworks or MCP implementations AWS Certified Machine Learning โ Specialty or Solutions Architect Microsoft Certified: Azure AI Engineer Associate (AI-102) or equivalent Familiarity with SOC 2 Type II and NIST SP 800-53 AU-9 audit controls #J-18808-Ljbffr