There are many statistical metrics used to measure the performance of machine learning models. While they work well when the model itself is the final product, classical measures are often not enough when the model is a part of a more complex system. In this session, we will focus on products where machine learning models do not generate the final output. The challenge in these cases is to define a metric that accurately captures the effect of different failures of the model on the final system performance. This becomes even more difficult when there are conflicting requirements within the system. We will discuss strategies to deal with these problems and share insights from real-world examples.