By using vision-language models (VLMs) or combining large language models (LLMs) with conventional computer vision models, we can create vision systems that are able to interpret policies and enable a much more sophisticated understanding of scenes and human behavior compared with current-generation vision models. We’ll illustrate these capabilities with several examples of commercial applications targeting use cases such as ensuring compliance with safety policies and manufacturing regulations. We’ll also share the lessons we’ve learned about the limitations and challenges of utilizing LLMs and VLMs in real-world applications.