Machine learning aims to construct models that are predictive: accurate even on data not used during training. But how should we assess accuracy? (Hint: simply computing the average error on a pre-determined test set, while nearly universal, is frequently a bad strategy.) How can we avoid catastrophic errors due to black swans—rare, highly atypical events? Consider that, at 30 frames per second, video presents so many events that even “highly atypical” ones occur every day! How can we avoid overreacting to red herrings—coincidences in the training data that are irrelevant? After all, a model’s entire knowledge of the world is the data used in training. To build more trustworthy models, we must re-examine how to measure accuracy and how best to achieve it. This talk will challenge some widely held assumptions and offer some novel steps forward, occasionally livened by colorful, zoological metaphors.