We introduce a comprehensive framework for scaling computer vision systems across three critical dimensions: capability evolution, infrastructure decisions and deployment scaling. Today’s leading-edge vision systems leverage scalable models that, when utilized through prompting, enable advanced capabilities without the resource demands of general-purpose AI vision. However, scaling these systems faces significant edge computing challenges, where limited compute power and networking capabilities restrict the number of camera streams that can be processed, leading to increased costs and complexity. We present a structured approach to navigating these trade-offs, showcasing automation tools and deployment strategies that help engineering teams with limited resources maximize capabilities while making optimal decisions between edge and cloud processing architectures.