Systems Engineer: Real-Time Engine
Nuance Labs
Job Description
About The Role We're building the engine that powers our AI avatar: a real-time interactive loop that continuously senses the user (audio and video), orchestrates inference across multiple models, manages state, and renders a coherent audio-visual response within tight latency budgets. Traditional real-time systems are hard because the timing requirements are strict. This system is harder: the system components are neural networks with variable latency, non-deterministic outputs, and no ability to pause the user while they think.
You're building a system that has to feel instantaneous while running inference that isn't. This is the runtime that makes a human‑AI conversation feel alive. You’ll own this runtime and collaborate closely with our research team on how models are invoked, how conversational context is assembled, and how response quality is balanced against latency.
You’ll have direct influence over architecture decisions as an early engineer at a small, well‑funded team. What You’ll Do Build and own the server‑side real‑time engine: session lifecycle, state management, and the architecture of the interaction loop, including the timing and scheduling layer that keeps the loop coherent. Integrate GPU‑backed model inference into the real‑time loop, wiring model outputs into the engine's state and render pipeline.
Develop performance tooling for latency breakdowns (TTFO, steady‑state), tracing, profiling, and regression detection. Collaborate with product and research to define how the system behaves at its boundaries — APIs, event streams, and the invariants the engine guarantees to the rest of the stack. Required Skills Real‑time streaming systems experience.
Built systems operating on a continuous real‑time loop with hard per‑tick latency budgets, where output must never stall. Strong Python and async programming. Productive immediately in Python — asyncio should be second nature.
Ability to write prototype code with clean enough architecture that survives a language port. Systems programming background. Production system written in Rust, but experience in at least one systems language (Rust, C++, Go) and motivation to adopt Rust.
Concurrency and state machine design. Experience designing concurrent systems: async runtimes, thread models, lock contention, schedulers. Specifically, managing multiple in‑flight async processes with cancellation, priority switching, and preemption.
Strong intuition for latency. Profiling, tail behavior, and trade‑offs across throughput vs. responsiveness. Ability to reason about end‑to‑end pipelines across CPU and GPU boundaries.
Comfort building from scratch under time pressure. Design the architecture and ship it, not maintain existing infrastructure. Comfortable with ambiguity and rapid iteration.
Bonus Points Experience with real‑time media systems: WebRTC, RTP/RTCP, jitter buffers, A/V sync. Experience with real‑time tick‑loop architectures (e.g., game engines, simulation runtimes, audio DSP pipelines, robotics). Experience with GPU inference serving and optimization: Triton, TensorRT, vLLM, CUDA profiling.
Building LLM agent orchestration systems. Familiarity with streaming generation systems: incremental decoding and mid‑stream control, lock‑free data structure design. Location In‑person collaboration, 5 days a week at Seattle HQ. #J-18808-Ljbffr