Date: Wednesday, May 24
Start Time: 10:15 am
End Time: 10:45 am
Video streams are so rich, and video workloads are so sophisticated, that we may now expect video ML to supply many simultaneous insights and transformations. It will be increasingly common to need video segmentation, object and motion recognition, SLAM, 3D model extraction, relighting, avatarization and neural compression in parallel. Conventionally, this combination would overwhelm edge compute resources, but novel multi-headed ML models and unified video pipelines make this feasible on existing personal devices and embedded compute subsystems. In this talk, we discuss the goals for advanced video intelligence in secure, edge-powered video communications, and show how new model structures can achieve very high accuracy, resolution and frame rate at low cost per function. We will also discuss improved objective and subjective quality metrics, training set synthesis and our optimized portable edge implementation methodology. We will wrap up with some observations on the challenges of even larger video workloads at the edge.