As generative AI models rapidly evolve—with increasingly dynamic transformer architectures and diverse framework variations—the challenge of bringing these models to the edge has grown dramatically. Traditional rule-based optimization pipelines can no longer keep pace with models whose structures shift quickly and whose computation patterns defy rigid assumptions. At the same time, edge hardware has become more fragmented than ever, spanning a wide range of devices that each require distinct optimization strategies to achieve efficient performance. In this session, we reframe what edge AI optimization must look like in the generative AI era. We will explore how NetsPresso (Nota AI’s AI model optimization platform) is evolving to support flexible, hardware-aware optimization approaches that adapt to emerging model architectures. We’ll also show how insight-driven workflows—powered by visual analysis and automated experiment pipelines—help engineers navigate hardware variability, uncover bottlenecks and identify the most effective deployment paths.
