LightTwist is an interactive virtual video studio that fuses real-time computer vision with game engine rendering. In this talk, we share what we learned scaling a macOS prototype into a cloud system. We’ll start with our on-device pipeline: an optimized deep learning background segmentation model shipped as a virtual camera for video calls, tightly integrated with Unity—and the limits we hit in GPU horsepower and multi-person scenes. Then we’ll explain why we moved to the cloud and walk through the architecture: WebRTC for low-latency AV; a clean split between computer vision (segmentation/green screen removal) and rendering (Unreal Engine); and a coordination layer that orchestrates them. We’ll cover the challenges of time stamp alignment across machines, cost control without blowing latency targets and development across many interdependent services. We’ll close with potential future evolutions: pushing compute to iPhones and revisiting on-device rendering as generative video matures.

