Given the growing utility of computer vision applications, how do we deploy these services in high-traffic production environments? Here we present GumGum’s approach to the infrastructure for serving computer vision models in the cloud. We elaborate on a few aspects. First, modularity of computer vision models, including handling images and video equivalently, creating module pipelines and designing for library agnosticism so we can leverage open source developments. Second, we discuss inter-process communication — specifically, the pros and cons of data serialization, and the importance of standardized data formats between training and serving data, which lends itself to automated feedback from serving data for re-training and automated metrics. Third,we discuss our approaches to scaling, including a producer/consumer model, scaling triggers and container orchestration. We will illustrate these aspects through examples of image and video processing, and module pipelines.